Phytofluors as fluorescent labels

ABSTRACT

This invention provides new fluorescent molecules useful for detection of target entities. In particular, it relates to fluorescent adducts comprising an apoprotein and a bilin.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This is related to U.S. Ser. No. 08/904,871, filed Aug. 1, 1997 (also published as WO 98/04700), which is a continuation-in-part of U.S. Ser. No. 60/023,217, filed on Aug. 2, 1996, both of which are incorporated herein by reference for all purposes.

[0002] This work was supported by grants (MCB 92-06110, MCB 96-04511) from the National Science Foundation, a grant (GAM9503140) from the US Department of Agriculture, and by NIH training grant (5 T32 GM07377-17). The Government of the United States of America may have certain rights in the invention.

FIELD OF THE INVENTION

[0003] The present invention relates to new fluorescent molecules useful for detection of target entities. In particular, it relates to fluorescent adducts comprising an apoprotein and a bilin.

BACKGROUND OF THE INVENTION

[0004] The phytochromes comprise a family of biliprotein photoreceptors which enable plants to adapt to their prevailing light environment (Kendrick and Kronenberg (1994) Kendrick, Pp. 828 in Photomorphogenesis in Plants, Dordrecht, The Netherlands: Kluwer Academic Publishers). All phytochromes possess the ability to efficiently photointerconvert between red light absorbing Pr and far red light absorbing Pfr forms, a property conferred by covalent association of a linear tetrapyrrole (or bilin) with a large apoprotein. Phytochromes from cyanobacteria, to green algae and higher plants consist of a well conserved N-terminal polypeptide, roughly 390-600 amino acids in length (see FIG. 6 of WO 98/04700), to which the bilin prosthetic group phytochromobilin (PΦB) or phycocyanobilin (PCB) is bound.

[0005] The N-terminal domain of the phytochrome apoprotein is sufficient for spontaneous covalent attachment of ethylidene containing linear tetrapyrroles, a process requiring neither cofactors nor additional enzymes (Li et al. (1992) J. Biol. Chem., 267: 19204-19210). In higher plants, PΦB is bound to a conserved cysteine residue within the phytochrome apoprotein via a linkage identical to that found in the phycobiliprotein photosynthetic antennae of cyanobacteria, red algae and cryptomonads. The ability of the phytochrome photoreceptor to self assemble with its bilin prosthetic group contrasts with the phycobiliprotein photoreceptors which require separate enzymes for proper bilin attachment (Glazer (1989) J. Biol. Chem., 264: 1-4). Owing to the efficient photointerconversion between Pr and Pfr forms, phytochromes are poorly fluorescent molecules, unlike the phycobiliproteins which are intensely fluorescent and have been exploited as useful probes (see, e.g., U.S. Pat Nos. 4,857,474, and 4,520,110).

[0006] Fluorescent markers have found uses in molecular biology as labels for nucleic acid probes, antibodies, and other specific binding ligands in the detection of particular target moieties (e.g., particular nucleic acid sequences, receptors, etc.). Labeled binding molecules are used both in vitro and in vivo as diagnostic indicators and as research tools. Consequently there has been considerable interest and research on the development of fluorescent indicators.

[0007] Typically biological macromolecules (e.g., proteins or oligonucleotides) are labeled with a fluorescent marker (e.g., fluorescein, rhodamine, umbelliferone, and lanthanide chelates) either directly through a covalent linkage (e.g., a carbon linker), or indirectly whereby the macromolecule is bound to a molecule such as biotin or dioxigenin, which, is subsequently coupled to a fluorescently labeled macromolecular binding moiety (e.g., streptavidin or a labeled monoclonal antibody). Fluorescein and rhodamine are among the most commonly used fluorophore since they are readily available in an activated form for direct coupling to antigens or antibodies. Both fluorescein and rhodamines show good chemical stability and have a proven record in actual use as labels. However, macromolecules labeled with these fluorophores suffer from chemical quenching of fluorescence, and it is difficult to control the labeling of discrete sites within the macromolecule.

[0008] These fluorescent labeling systems also suffer the disadvantage that the fluorescent complexes and/or their binding moieties are relatively large, and must be prepared and supplied from an exogenous source because most organisms are not capable of synthesizing these molecules. In addition, these molecules are often toxic to the subject organism.

[0009] With only one exception, the Green Fluorescent Protein (GFP) from the jellyfish Aequorea victoria (U.S. Pat No. 5,491,084), the ability to synthesize a fully functional fluorescent macromolecule has been restricted to the host organism in which the protein naturally occurs. Because the nucleic acid encoding GFP can be cloned into a cell and expressed to yield a non-fluorescent protein precursor that spontaneously assembles its own fluorophore, GFP has gained widespread utility as a selectable marker and a probe of cellular events (Cubitt et al. (1995) Trends In Biochem. Sci. 20, 448-455). From many attempts to improve the properties of GFP through genetic engineering, it is clear that there is a finite spectral window within which GFP is useful as a fluorescent marker. The development of additional protein-based fluorescent markers that can be functionally expressed in various cell types by standard genetic engineering techniques with an extended fluorescence wavelength range, and a variety of useful biochemical properties is desirable.

[0010] A recent development in the field of fluorescent labeling has been the use of phycobiliprotein conjugates. Phycobiliproteins are a class of highly fluorescent proteins that form a part of the light-harvesting system in the photosynthetic apparatus of bluegreen bacteria and of two groups of eukaryotic algae, red algae and the cryptomonads. A particularly useful variation of their use comprises preparation of a phycobiliprotein tandem conjugate with a large Stokes shift. An example of such a conjugate is the covalent attachment of the phycobiliproteins, phycoerythrin and allophycocyanin. The resulting tandem conjugate has a large Stokes shift with an emission maximum at 660 nm and an excitation waveband that starts at about 440 nm. However, production of such tandem complexes requires human intervention in the formation of a covalent or other chemical bond between the two components, therefore increasing the complexity of the production of the final conjugate.

[0011] Despite these advances, the art fails to provide fluorescent markers that can be easily produced and readily engineered to provide strong fluorescent signals over a wide range of wavelengths. The present invention addresses these and other needs.

SUMMARY OF THE INVENTION

[0012] This invention provides a new class of fluorescent protein adducts (phycobilin conjugates) that are generally suitable for use as fluorescent markers. Owing to their long wavelength absorption maxima, their high molar absorption coefficients and the ability of recombinant phytochrome apoproteins to spontaneously assemble with a variety of bilin chromophore precursors, the phytochromes are potentially ideal fluorescent markers.

[0013] Phytochromes perform a key role as light sensors in most photosynthetic organisms, via photoisomerization of the covalently bound phytochromobilin or phycocyanobilin prosthetic group which induces a protein conformational change and subsequent signal transduction cascade. The adduct between recombinant apophytochrome and phycoerythrobilin (PEB), the natural chromophore precursor of phycoerythrin, is highly fluorescent because it lacks the double bond required for photoisomeration. This invention demonstrates that fluorescent apophytochrome-bilin conjugates (e.g., apophytochrome-PEB adducts), which are referred to herein as the “phytofluors”, are intensely fluorescent, photostable proteins useful as probes for biological research.

[0014] In a preferred embodiment, the fluorescent adducts (i.e., phytofluors) of this invention comprise a protein component (an apoprotein) and a nitrogen heterocyclic compound (e.g., a polypyrrole). In a preferred embodiment the nitrogen heterocycle is a dipyrrole, tripyrrole, tetrapyrrole, or analogues thereof, with linear tetrapyrroles and analogues thereof being most preferred. In some embodiments, higher order pyrroles and their analogues can also be used. One particularly preferred bilin is phycoerythrobilin (PEB). The apoprotein is preferably an apophytochrome or analogue thereof. Preferred analogues are recognized by and thus comprise the consensus sequence discussed above. The apoprotein can be derived from vascular and non-vascular plants, green alga, bacteria or cyanobacteria, or can be chemically synthesized de novo. Thus, preferred apoproteins are encoded by plant genes, algal genes, bacterial genes, or cyanobacterial genes. Particularly preferred apoproteins include any of the apoproteins described herein or those listed in the sequence listing or conservative substitutions of these sequences, while most preferred apoproteins include apoproteins from plants (e.g., oats with an apoprotein having about 1100 amino acid residues), green algae (e.g., Mesotaenium caldariorum), or cyanobacteria (as illustrated in the sequence listing), or related, proteins having conservative substitutions. Truncated apoproteins consisting of a chromophore domain; the apoprotein N-terminal subsequence sufficient for lysase activity are particularly preferred. One preferred N-terminal subsequence consists of less than about 600 N-terminal amino acids, more preferably less than about 400 N-terminal amino acids, and most preferably about 200 N-terminal amino acids.

[0015] In one preferred embodiment, this invention provides for a moiety that is labeled with one or more of the fluorescent adducts of this invention. The fluorescent adduct is attached covalently, or non-covalently, directly, or through a linker to a moiety that is to be labeled. The moiety can be virtually any composition, including for example, a biological molecule (biomolecule), an organelle, a cell, a tissue, virtually any naturally occurring natural or synthetic material that is chemically compatible with the fluorescent adduct, and even an article of manufacture. In a particularly preferred embodiment, the fluorescent adducts of this invention are attached to biological molecules including, but not limited to proteins, carbohydrates, lipids, and nucleic acids. Particularly preferred biological molecules are members of binding pairs (binding partners) that specifically bind to a target molecule. Preferred members of binding pairs include antibodies, nucleic acids, lectins, enzymes, ligands, receptors, and the like.

[0016] The fluorescent adduct can be joined to the moiety to be labeled either by attachment to the bilin or by attachment to the apoprotein, with attachment to the apoprotein being most preferred. The apoprotein can be chemically conjugated to the subject (labeled) molecule or, where the subject moiety is a protein or contains a protein component, the apoprotein can be fused to the amino or carboxyl terminus of the protein or protein component through a peptide bond thereby forming a fusion protein. The fusion protein can also be a recombinantly expressed fusion protein. Alternatively, the apoprotein can be joined to the protein or protein component of the subject moiety through linkages between side chains (e.g., a disulfide linkage between cysteines).

[0017] This invention also provides methods of use for the above-described fluorescent adducts and for the compositions comprising a moiety joined to any of the fluorescent adducts described above or herein. Thus, for example, in one embodiment, this invention provides for a method of testing the presence of a biomolecule in a sample. The method involves providing a sample comprising a biomolecule linked to a fluorescent adduct consisting of an apoprotein and a bilin chromophore and contacting the sample with light which causes the fluorescent adduct to emit light, and detecting the emitted light thereby detecting the presence of the biomolecule. In one particularly preferred embodiment, the sample is contacted with light having a wavelength of about 570 nm. The step of detecting the emitted light may include detecting light having a wavelength of about 590 nm. In a particularly preferred embodiment, the biomolecule is one or more of any of the above-identified biomolecules.

[0018] This invention also provides methods of expressing and detecting a selectable marker. These methods include providing a nucleic acid that encodes a protein of interest and any of the apoproteins described above and herein. The expressed apoprotein is contacted with a bilin, more preferably one of the bilins described above or herein to form a fluorescent adduct. Finally, the fluorescent adduct is contacted with light which causes the fluorescent adduct to fluoresce emitting light which is then detected thereby indicating the presence of the selectable marker.

[0019] In still yet another embodiment, this invention provides a method of detecting and/or quantifying protein-protein interactions. The two subject proteins are expressed in fusion with or conjugated to an apoprotein. The apoproteins are selected such that, when combined with their respective bilins, they form a first and a second fluorescent adduct, respectively. The first adduct fluoresces at a wavelength absorbed by the second adduct which then emits at a different wavelength. Exposure of the proteins with light causes the first fluorescent adduct to emit light that is transferred to the second fluorescent adduct which then emits light at a different wavelength thereby indicating that the two proteins are in close proximity. This invention also provides for numerous other variants of this assay which are disclosed herein.

DEFINITIONS

[0020] The term “fluorescent adduct” refers to a fluorescent molecule (i.e., one capable of absorbing light of one wavelength and emitting light of a second wavelength) comprising an “apoprotein” (also referred to as an apophytochrome) component joined to a “bilin” component, both of which are described below. The fluorescent phytochrome-bilin conjugates (e.g., phytochrome-PEB adducts), are also referred to herein as “phytofluors”. The manner in which the two components are joined to form an adduct is irrelevant to the present invention. Typically, the two components spontaneously form an adduct through covalent interactions. The components may also be deliberately linked through covalent bonds (e.g., through the use of crosslinking reagents). The fluorescent adducts of this invention do not require pairing of an apoprotein with its corresponding native bilin. To the contrary, the invention contemplates adducts consisting of naturally occurring or engineered apoproteins with bilins derived from different organisms, or with non-naturally occurring synthetic linear pyrroles.

[0021] The terms “apoprotein”, “apophytochrome”, or “apoprotein polypeptide”, as used herein, refer to polypeptides derived from eukaryotes, such as vascular plants, non-vascular plants, and algae, or from prokaryotes, such as cyanobacteria, or other eubacteria and archaebacteria. The term encompasses both naturally occurring apoproteins and variant polypeptides derived through mutagenesis. The apoproteins have a hydrophobic pocket, referred to as chromophore or bilin binding site, capable of forming an adduct with a bilin component. The prototypical eukaryotic apoproteins of the invention are typically homodimeric proteins about 1100 amino acids in length, each subunit being composed of two major domains. The globular 70 kD N-terminal domain contains the hydrophobic pocket, while the more elongated 55 kD carboxyl terminal domain contains the sites at which the two subunits are associated. Apophytochromes containing a bilin binding site can be readily identified by one of skill in the art by comparison of the polypeptide sequence in question with the apophytochrome consensus sequence discussed above using standard sequence comparison methodologies. For a general discussion of apoprotein structure and function, see, Quail et al. (1997) in Plant Cell and Environment, 20: 657-665.

[0022] The preferred apoproteins of the invention typically consist essentially of a chromophore domain. The terms “chromophore domain” or “minimal chromophore domain” or “lyase domain” refer to the apoprotein N-terminal subsequence sufficient for lyase activity and thereby form a covalent bond between the apoprotein and the bilin. Lyases are enzymes that catalyze the reversible formation of a covalent adduct between a hydroxyl- or thiol-containing substrate and a substrate containing a double bond (i.e. addition of a nucleophile to a double bond). Chromophore domains are typically between about 180 and about 250 amino acids, typically between about 190 amino acids and about 220 amino acids, and usually about 200 amino acids in length (e.g., 197 amino acids). Typically, this spontaneous assembly results in the formation of a phytofluor.

[0023] The apoproteins of the invention typically comprise less than about 600 amino acids of the N terminus (including the chromophore or lyase domain) of the full length apoprotein, preferably less than about 515 amino acids, more preferably less than about 450 amino acids and most preferably less than about 400, 390, or even 350 N-terminal amino acids. Preferred apoproteins of the invention typically comprise between about 200 and about the 400 N-terminal amino acids of the full length apoprotein, including the lyase domain. A preferred embodiment consists essentially of the lyase domain.

[0024] The “bilin” components of the adducts of the invention are linear polypyrroles (e.g., di-, tri-, or tetrapyrroles) capable of fluorescing when associated with an apoprotein. Typically, the bilin components of the invention are isolated from vascular plants, algae, or cyanobacteria according to standard techniques. The bilin components can also be synthesized de novo. For a general discussion of bilins useful in the present invention see, Falk (1989) Pp. 355-399 in: The Chemistry of Linear Oligopyrroles and Bile Pigments. pp 355-399. Springer-Verlag, Vienna.

[0025] The phrase “nucleic acid” refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5′ to the 3′ end. It includes cDNA, self-replicating plasmids, infectious polymers of DNA or RNA and non-functional DNA or RNA.

[0026] The term “subsequence” when referring to a nucleic acid refers to a nucleic acid sequence that comprises a part of a longer sequence of a nucleic acid, and when referring to a peptide refers to an amino acid sequence that comprises part of a longer sequence of a peptide, polypeptide or protein.

[0027] Two nucleic acid sequences or polypeptides are said to be “identical” if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below. The term “complementary to” is used herein to mean that the complementary sequence is identical to all or a portion of a reference polynucleotide sequence.

[0028] Sequence comparisons between two (or more) polynucleotides or polypeptides are typically performed by comparing sequences of the two sequences over a “comparison window” to identify and compare local regions of sequence similarity. A “comparison window”, as used herein, refers to a segment of at least about 20 contiguous positions, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.

[0029] Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A.) 85: 2444 (1988), by computerized implementations of these algorithms such as CLUSTALW, GAP, BESTFIT, BLAST, FASTA, and TFASTA (Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.), or by inspection.

[0030] “Percentage of sequence identity” is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.

[0031] The term “substantial identity” of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 60% sequence identity, preferably at least 80%, more preferably at least 90% and most preferably at least 95%, compared to a reference sequence using the programs described above preferably BLAST) using standard parameters. In one embodiment, 25% sequence identity over a window of 200 amino acids coupled with information regarding the apophytochrome consensus sequence is sufficient to identify a new apophytochrome. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 40%, preferably at least 60%, more preferably at least 90%, and most preferably at least 95%. Polypeptides which are “substantially similar” share sequences as noted above except that residue positions which are not identical may differ by conservative amino acid changes. Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.

[0032] Another indication that nucleotide sequences are substantially identical is if two nucleic acid molecules hybridize to each other, or to a third nucleic acid, under stringent conditions. Stringent conditions are sequence dependent and will be different in different circumstances. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (T_(m)) for the specific sequence at a defined ionic strength and pH. The T_(m) is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched complementary nucleic acid sequence. Typically, stringent conditions will be those in which the salt concentration is about 0.02 molar at pH 7 and the temperature is at least about 60° C. Stringent conditions for a standard Southern hybridization will include at least one wash (usually 2) in 0.2X SSC at a temperature of at least about 50° C., usually about 55° C., for 20 minutes, or equivalent conditions.

[0033] The term “conservative substitution” is used herein to refer to replacement of amino acids in a protein with different amino acids that do not substantially change the functional properties of the protein. Thus, for example, a polar amino acid might be substituted for a polar amino acid, a non-polar amino acid for a non-polar amino acid, and so forth. The following six groups each contain amino acids that are conservative substitutions for one another:

[0034] 1)Alanine (A), Serine (S), Threonine (T);

[0035] 2) Aspartic acid (D), Glutamic acid (E);

[0036] 3)Asparagine (N), Glutamine (Q);

[0037] 4)Arginine (R), Lysine (K);

[0038] 5)Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and

[0039] 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).

[0040] A biological “binding partner” or a member of a “binding pair” refers to molecules that specifically bind other molecules to form a binding complex such as antibody-antigen, lectin-carbohydrate, nucleic acid-nucleic acid, biotin-avidin, etc.

[0041] The term “specifically binds”, as used herein, when referring to a biomolecule (e.g., protein, nucleic acid, antibody, etc.), refers to a binding reaction which is determinative of the presence of a specific biomolecule within a heterogeneous population of proteins and/or other biologics. Thus, under designated conditions (e.g. immunoassay conditions in the case of an antibody), the specified ligand or antibody binds to its particular “target” biomolecule (e.g. a receptor protein) and does not bind in a significant amount to other proteins or other biomolecules present in the sample, or to other proteins or other biomolecules with which the ligand or antibody may come in contact in an organism.

[0042] The term “antibody”, as used herein, includes various forms of modified or altered antibodies. Such forms include, but are not limited to, an intact immunoglobulin, an Fv fragment containing only the light and heavy chain variable regions, an Fv fragment linked by a disulfide bond (Brinkmann, et al. (1993) Proc. Natl. Acad. Sci. USA, 90: 547-551), an Fab or (Fab)′₂ fragment containing the variable regions and parts of the constant regions, a single-chain antibody and the like (Bird et al. (1988) Science 242: 424-426; Huston et al. (1988) Proc. Nat. Acad. Sci. USA 85: 5879-5883). The antibody may be of animal (especially hamster, mouse, rat, rabbit, pig, or goat) or human origin or may be chimeric (Morrison et al., Proc Nat. Acad. Sci. USA 81: 6851-6855 (1984)) or humanized (Jones et al. (1986) Nature 321: 522-525, and published UK patent application No: 8707252). Methods of producing antibodies suitable for use in the present invention are well known to those skilled in the art and can be found described in such publications as Harlow & Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory (1988), and Asai, Methods in Cell Biology Vol 377: Antibodies in Cell Biology, Academic Press, Inc. N.Y. (1993).

BRIEF DESCRIPTION OF THE DRAWINGS

[0043]FIG. 1 shows the putative structural organization of the predicted protein products of these open reading frames. The roughly 200 amino acid chromophore domain of these proteins which include the cysteine site of bilin attachment is shown with an asterisk.

[0044]FIG. 2 shows a multiple sequence alignment of the chromophore domains of representative eukaryotic phytochromes the Arabidopsis (At) phyA, phyB/D, phyC and phyE proteins and the green algal phytochrome (Mcphy1b) and the cyanobacterial phytochrome sequences of the invention, cph1-2 and cp11-6. In the figure, each cyanobacterial protein is referred to by its cyanobase locus designation. The correspondence between protein name and locus name is as follows: Cph1 SLR0473 Cph2 SLL0821 Cpl1 (Cph3) SLL1473 Cpl2 (Cph4) SLL1124 Cpl3 (Cph5) SLL0041 Cpl4 (Cph6) SLR1212 Cpl5 (Cph7) SLR1393 Cpl6 (Cph8) SLR1969

[0045] The comparison was obtained using the Wisconsin Genetics Computing Group program PILEUP. This multiple sequence alignment defines the bilin chromophore binding domain of the phytochrome superfamily.

[0046]FIG. 3A shows the expression vector used to express a Strep-Tagged version of N-terminal 197 amino acid region of Cph2 (Cph2-N197).

[0047]FIG. 3B provides the sequence of Cph2-N197 (SEQ ID NO: 9).

DETAILED DESCRIPTION

[0048] This invention is directed to fluorescent adducts, referred to herein as phytofluors, and their use as fluorescent markers or labels in a variety of contexts. The phytofluors comprise an apoprotein component (e.g. an oat or cyanobacterial apophytochrome) joined to a bilin component (e.g., phycoerythrobilin (PEB)). The phytofluors (fluorescent adducts) may be chemically conjugated or fused (i.e. recombinantly expressed as a fusion protein) to a subject moiety that is to be so labeled. In a preferred embodiment the labeled moiety is a member of a biological binding pair for use in any known or later discovered technique involving fluorescent labeling of analytes or other moieties.

[0049] The apoproteins and bilins forming the fluorescent phytofluors of this invention are available from natural sources or can be modified to provide novel complexes having different absorbance, emission, or labeling characteristics. These compositions find use for labeling of virtually any molecule or material that is chemically compatible with the fluorescent adducts. The phytofluors are well suited for labeling biological molecules and are particularly used to label a biochemical binding-pair member so that the resulting conjugates or fusions can be used in assays involving non-covalent binding to the complementary member of the specific binding pair. A wide variety of methods involve competitive or non-competitive binding of ligand to receptor for detection, analysis, or measurement of the presence of ligand or receptor.

[0050] Thus, for example, in one embodiment, this invention provides for antibodies or antibody fragments to which the fluorescent adducts (phytofluors) of this invention are joined (either covalently or non-covalently). The antibodies are capable of specifically binding to the antigen to which they are directed. Detection of the presence, absence, or amount of fluorescence of the antibody-bound fluorescent adduct of this invention provides an indication of presence, absence, or amount of analyte to which the antibody is directed.

[0051] Similarly phytofluor labeled antibodies, or other ligands, can be used in immunohistochemical applications. In this context, fluorescent adduct labeled antibodies are used to probe cells, tissues, and sections thereof. When the subject sample is contacted with the labeled ligand, the ligand binds and localizes to specific regions of the sample in which the target molecule (the molecule or moiety recognized by the ligand) is located. Localization and/or quantification of the fluorescent signal produced by the attached phytofluor provides information concerning the location and/or quantity of the target molecule in the sample. One of skill in the art will appreciate that the phytofluors of this invention are also well suited for in situ and in vivo labeling of molecules, cells, and cellular components.

[0052] The phytofluor labels of this invention can be attached to a wide variety of biological molecules in addition to antibodies. This may include proteins, in particular proteins recognized by particular antibodies, receptors, enzymes, or other ligands, nucleic acids (e.g., single or double stranded DNA, cDNA, mRNA, cRNA, rRNA, tRNA, etc.) various sugars and polysaccharides, lectins, enzymes, and the like. Uses of the various labeled biomolecules will be readily apparent to one of skill in the art. Thus, for example, labeled nucleic acids can be used as probes to specifically detect and/or quantify the presence of the complementary nucleic acid in, for example, a Southern blot.

[0053] The phytofluors of this invention can be attached to non-biological molecules and various articles of manufacture. Thus, for example where it is desired to associate an article of manufacture with a particular manufacturer, distributor, or supplier, the phytofluor, or simply one component of the phytofluor can be attached to the subject article. Later development (e.g., by addition of the second component such as bilin or apoprotein) and exposure to an appropriate light source will provide a fluorescent signal identifying the article as one from a source of such labeled articles.

[0054] In another embodiment, the phytofluors of this invention can be used for probing protein-protein interactions. In a preferred embodiment, two apoprotein cDNA constructs are used. The first construct will encode a apoprotein species whose assembly with a given bilin emits at a well defined wavelength (donor). The second construct will encode an apoprotein species whose assembly with the same, or different, bilin produces a fluorescent species that both absorbs and emits light to longer wavelengths (acceptor). Protein-protein interaction between two proteins of interest (e.g., protein X and protein Y) is identified following their co-expression as translational fusions with apoprotein in constructs 1 (donor) and 2 (acceptor) using fluorescence energy transfer from the shorter wavelength-absorbing donor species to the longer wavelength-absorbing acceptor species. In a preferred embodiment, the fluorescent phytochrome species are selected to have good spectral overlap. Proximity caused by the protein-protein interaction between the translational fused proteins X and Y will then permit fluorescence energy transfer thereby providing an indication of proximity between protein X and protein Y. This application can utilize the uptake of exogenous bilin pigment into living cells, or alternatively, may use endogenously expressed bilins in various organisms and cell types.

[0055] In a specific application, a yeast or E. coli strain containing donor construct 1, engineered to produce a fluorescent chimeric protein bait with a known cDNA sequence, will be co-transformed, simultaneously or sequentially, with a prey cDNA library (i.e., plasmid or phage). The prey cDNA library will be constructed using acceptor construct 2 for expression of apoprotein-protein fusions which yield fluorescent tagged protein products in the presence of the correct bilin. Co-transformation events which express prey proteins in the library that interact with the expressed bait polypeptide can be identified by illuminating the shorter wavelength absorbing donor phytofluor species and viewing emission from the longer wavelength acceptor phytofluor emitting species. Actinic illumination for this screen can either be obtained with a quartz halogen projector lamp filtered through narrow bandpass filters or with a laser source and fluorescence detection of colonies using digital imaging technology (Arkin et al. (1990) Bio-Technology 8: 746-749). Fluorescent activated cell sorting (FACS) can also be used to identify cells co-expressing interacting donor and acceptor proteins.

[0056] In another application of this invention, the apoprotein cDNA in donor construct 1 “prey” is substituted with a green fluorescent protein (GFP) cDNA or construction of GFP-tagged cDNA expression libraries. By co-expression of apoprotein- tagged bait construct (Construct 2 above) with the GFP-tagged “prey” library, proteins which interact with the bait polypeptide will be visualized by energy transfer from GFP to the phytochrome tagged bait using, for example, digital imaging technology or FACS. The ability of GFP to spontaneously assemble its fluorophore makes it unnecessary to make two apoprotein constructs which have different fluorescence properties.

[0057] In a third specific application, chimeric apoprotein-protein X cDNA (where protein X is any protein of interest) are expressed in transgenic eukaryotes (yeast, plants, Drosophila, etc.) in order to study the subcellular localization of protein X in situ. Following feeding of exogenous bilin, subcellular localization can be performed using fluorescence microscopy (e.g., laser confocal microscopy).

[0058] In one particularly preferred embodiment, the phytofluors of this invention are used as in vitro or in vivo labels in a manner analogous to the use of Green Fluorescent Protein (GFP). This typically involves transfecting a cell with a nucleic acid encoding an apoprotein in such an manner that the cell expresses the apoprotein (e.g., the nucleic acid is a component of an expression cassette). When the apoprotein is contacted with the appropriate bilin, supplied either exogenously or produced endogenously, the phytofluor (fluorescent adduct) self assembles and thereby produces a fluorescent marker.

[0059] Uses of such a marker are well known to those of skill in the art (see, e.g., U.S. Pat. No. 5,491,084 which describes uses of GFP). In one preferred embodiment, the phytofluor can be used as a marker to identify transfected cells. In the simplest approach, a nucleic acid expressing an apoprotein such as that described in Example 1 can be provided as a marker in a vector. The apoprotein, along with the cloned protein of interest, will be expressed in the transfected host. Application of the appropriate exogenous bilin will cause formation of the fluorescent adduct permitting ready detection of the transformed cell. Alternatively, the apoprotein can form an adduct with an endogenous bilin produced by the transformed organism (e.g., a plant cell). In this embodiment, the apoprotein will be a variant which forms fluorescent adduct when combined with the naturally occurring bilin.

[0060] Based on the disclosure provided herein, one of skill will readily appreciate that there are numerous other uses to which the phytofluors (fluorescent adducts) of this invention can be applied.

[0061] Preparation of apoprotein polypeptides.

[0062] Apoprotein polypeptides used in the phytofluors of this invention can be expressed recombinantly or isolated from natural sources according to standard techniques. The polypeptides or nucleic acids encoding them can be prepared from a wide range of organisms including vascular plants, algae, and cyanobacteria.

[0063] In higher plants, apoprotein polypeptides are encoded by a gene family of at least five structurally related members designated PHYA-PHYE (see, Terry et al. (1993) Arch. Biochem. Biophys. 306:1-15 and Scharrock et al. (1989) Genes Dev. 3:1745-1757). The primary structures of all apoproteins are very similar, with a polypeptide of about 1100 amino acids in length (Quail et al. in Phytochrome Properties and Biological Action (Thomas and Johnson eds.) pp 13-38, (Springer-Verlag, Berlin 1991)). The native protein is a homodimer; the individual subunits being composed of two major domains. The globular 70 kD N-terminal domain contains the hydrophobic pocket in which the bilin chromophore resides (Gabriel et al. (1993) J. Theor. Biol. 44:617-645. The more elongated 55 kD carboxyl terminal domain contains the sites at which the two subunits are associated (Edgerton et al. (1992) Plant Cell 4:161-171). This domain is also responsible for phytochrome function, although both domains are thought to participate in the signal transmission process in native phytochrome.

[0064] Phytofluor apoproteins can be isolated from natural sources, most preferably from bilin-deficient natural sources including, vascular and nonvascular plants, algae and cyanobacteria using standard protein isolation techniques well known to those of skill in the art. Generally, these methods involve standard techniques well known in the art, including selective precipitation with such substances as ammonium sulfate, column chromatography, immunopurification methods, and others. See, for instance, R. Scopes, Protein Purification: Principles and Practice, Springer-Verlag: New York (1982).

[0065] In preferred embodiments, the polypeptides are produced recombinantly. Standard methods for preparation of recombinant proteins can be used for this purpose. For a discussion of the general laboratory procedures required for this purpose see, Sambrook, et al., (1989) Molecular Cloning: A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y..

[0066] Nucleic acids encoding apoprotein polypeptides can be isolated from a number of organisms according to standard techniques. Exemplary genes are those isolated from higher plants (e.g., AsphyA and AtphyA), and the green alga Mesotaenium caldariorum (i.e. Mcphy1b). In addition, genes encoding apophytochrome can be obtained from cyanobacteria. It was a discovery of this invention that the cyanobacteria Synechocystis sp. produces an apophytochrome. In particular, the open reading frame listed in GenBank D64001, locus 1001165 and designated herein as S6803phyl was determined to be an apophytochrome by sequence alignment methods. Having identified herein that cyanobacteria produce apophytochromes, identification of other cyanobacterial apophytochromes can be accomplished using routine methods available to one of skill in the art. Sequences for these apoproteins are provided in the sequence listing below. The corresponding nucleic acid sequences are known to those of skill in the art. One of skill will recognize that these sequences can be used to determine the design primers and probes for isolation of related genes in other organisms. Cyanobacterial nucleic acid sequences are also available at the Cyanobase Web Site. http://www.kazusa.or.jp/cyano/.

[0067] Generally, recombinant expression techniques involve the construction of recombinant nucleic acids and the expression of genes in transfected cells. Molecular cloning techniques to achieve these ends are known in the art. A wide variety of cloning and in vitro amplification methods suitable for the construction of recombinant nucleic acids are well-known to persons of skill. Examples of these techniques and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, Calif. (Berger); and Current Protocols in Molecular Biology, F. M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 Supplement) (Ausubel).

[0068] The polypeptides are expressed in a recombinantly engineered cell such as plants, bacteria, yeast, insect (especially employing baculoviral vectors), and mammalian cells. It is expected that those of skill in the art are knowledgeable in the numerous expression systems available for expression of the DNA encoding apoprotein polypeptides. No attempt to describe in detail the various methods known for the expression of proteins in prokaryotes or eukaryotes will be made.

[0069] In brief, the expression of natural or synthetic nucleic acids encoding the polypeptides will typically be achieved by operably linking the DNA or cDNA to a promoter (which is either constitutive or inducible), followed by incorporation into an expression vector. The vectors can be suitable for replication and integration in either prokaryotes or eukaryotes. Typical expression vectors contain transcription and translation terminators, initiation sequences, and promoters useful for regulation of the expression of the DNA encoding the binding domains. To obtain high level expression of a cloned gene, it is desirable to construct expression plasmids which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational initiation, and a transcription/translation terminator.

[0070] Expression in Prokaryotes

[0071] Examples of regulatory regions suitable for this purpose in E. coli are the promoter and operator region of the E. coli tryptophan biosynthetic pathway as described by Yanofsky (1984) J. Bacteriol., 158: 1018-1024 and the leftward promoter of phage lambda (P_(L)) as described by Herskowitz and Hagen (1980) Ann. Rev. Genet., 14: 399-445. The inclusion of selection markers in DNA vectors transformed in E. coli is also useful. Examples of such markers include genes specifying resistance to ampicillin, tetracycline, or chloramphenicol. See Sambrook et al. for details concerning selection markers for use in E. coli.

[0072] Expression systems for expressing the polypeptides are available using E. coli, Bacillus sp. (Palva et al. (1983) Gene 22:2 29-235; Mosbach et al. Nature, 302:543-545) and Salmonella. E. coli systems are preferred.

[0073] The apoprotein polypeptides produced by prokaryote cells may not necessarily fold properly. During purification from E. coli, the expressed polypeptides may first be denatured and then renatured. This can be accomplished by solubilizing the bacterially produced proteins in a chaotropic agent such as guanidine HCl and reducing all the cysteine residues with a reducing agent such as beta-mercaptoethanol. The polypeptides are then renatured, either by slow dialysis or by gel filtration (see, e.g., U.S. Pat. No. 4,511,503).

[0074] Expression in Eukaryotes

[0075] A variety of eukaryotic expression systems such as yeast, insect cell lines and mammalian cells, are known to those of skill in the art. As explained briefly below, the apoprotein polypeptides may also be expressed in these eukaryotic systems.

[0076] Expression in Yeast

[0077] Synthesis of heterologous proteins in yeast is well known and described. Methods in Yeast Genetics, Sherman et al., Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the various methods available to produce the polypeptides in yeast.

[0078] Preferred yeast expression systems are described in Wahleithner et al. (1991) Proc. NatL. Acad. Sci. USA 88:10387-10391, Murphy and Lagarias (1997) Photochem. Photobiol., 65: 750-758, and Wu et al. (1996) Proc. Natl. Acad. Sci., USA, 93: 8989-8994. Further examples of yeast expression are described below. A number of yeast expression plasmids like YEp6, YEp 13, YEp4 can be used as vectors. A gene of interest can be fused to any of the promoters in various yeast vectors. The above-mentioned plasmids have been fully described in the literature (Botstein et aL (1979) Gene, 8: 17-24; Broach et al. (1979) Gene, 8: 121-133).

[0079] The polypeptides can be isolated from yeast by lysing the cells and applying standard protein isolation techniques to the lysates. The monitoring of the purification process can be accomplished by using spectroscopic techniques, or by using Western blot techniques or radioimmunoassays, or other standard immunoassay techniques.

[0080] Expression in Plants

[0081] The apoprotein polypeptides of this invenion can also be expressed in plants or plant tissues. Plant tissue includes differentiated and undifferentiated tissues of plants including, but not limited to, roots, shoots, leaves, pollen, seeds, tumor tissue, such as crown galls, and various forms of aggregations of plant cells in culture, such as embryos and calli. The plant tissue may be in plants, cuttings, or in organ, tissue, or cell culture.

[0082] The recombinant DNA molecule encoding the apoprotein polypeptide under the control of promoter sequences may be introduced into plant tissue by any means known to the art. The technique used for a given plant species or specific type of plant tissue depends on the known successful techniques. The various DNA constructs described above may be introduced into the genome of the desired plant by a variety of conventional techniques. For example, the DNA construct may be introduced directly into the genomic DNA of the plant cell using polyethylene glycol precipitation (Paszkowski et al. (1984) Embo J. 3: 2717-2722) electroporation and microinjection of plant cell protoplasts (Fromm et al. (1985) Proc. Natl. Acad. Sci. USA 82: 5824), or the DNA constructs can be introduced into plant tissue using ballistic methods, such as DNA particle bombardment (Klein et al. (1987) Nature 327: 70-73). Alternatively, the DNA constructs may be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent marker gene(s) (if present) into the plant cell DNA when the cell is infected by the bacteria. For a review of gene transfer methods for plant and cell cultures see, Fisk et al. (1993) Scientia Horticulturae 55: 5-36 (1993) and Potrykus (1990) CIBA Found. Symp. 154: 198.

[0083]Agrobacterium tumefaciens-meditated transformation techniques are the most commonly used techniques for transferring genes into plants. These techniques are well described in the scientific literature. See, for example Horsch et al. (1984) Science 233: 496-498, Fraley et al. (1983) Proc. Natl. Acad. Sci. USA 80: 4803, and Hooykaas (1989) Plant Mol. Biol 13: 327-336, Bechtold et al. (1993). Comptes Rendus De L Academie Des Sciences Serie Iii-Sciences De La Vie-Life Sciences 316: 1194-1199, Valvekens et al. (1988) Proc. Natl. Acad. Sci. USA 85: 5536-5540.

[0084] All species which are natural plant hosts for Agrobacterium are transformable in vitro. Most dicotyledonous species can be transformed by Agrobacterium. Monocotyledonous plants, and in particular, cereals, have not previously been regarded as natural hosts to Agrobacterium. There is, however, growing evidence that monocots can be transformed by Agrobacterium. Using novel experimental approaches cereal species such as rye (de la Pena et al. (1987) Nature 325: 274-276), corn (Rhodes et al. (1988) Science 240: 204-207), and rice (Shimamoto et al., (1989) Nature 338: 274-276) may now be transformed.

[0085] Transformation of a number of woody plants using Agrobacterium and other methods has been described. (Shuerman et al. (1993) Scientia Horticulturae 55: 101-124). For instance, regeneration and transformation of apples is described in James et al. (1989) Plant Cell Rep. 7: 658-661. Tissue culture procedures for apple including micropropagation, (Jones (1976) Nature 262: 392-393; Zimmerman (1983) Pp 124-135 In Methods in Fruit Breeding,) and adventitious bud formation (James (1987) Biotechnology and Genetic Engineering Reviews, 5: 33-79) have also been described. After transformation, transformed plant cells or plants comprising the introduced DNA must be identified. A selectable and/or scorable marker gene is typically used. However, the apoproteins can be detected directly through the formation of a fluorescent adduct with a bilin. In another embodiment, the apophytochrome (apoprotein) can be modified to utilize the endogenous or modified bilins produced in plants. Transformed plant cells can be selected by growing the cells on growth medium containing the appropriate antibiotic. In some instances, the presence of opines can also be used if the plants are transformed with Agrobacterium. After selecting the transformed cells, one can confirm expression of the introduced apoprotein gene(s). Simple detection of mRNA encoded by the inserted DNA can be achieved by well known methods in the art, such as Northern blot hybridization. The inserted sequence can be identified using the polymerase chain reaction (PCR) and Southern blot hybridization, as well (see, e.g., Sambrook, supra.).

[0086] Transformed plant cells (e.g., protoplasts) which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus expresses the desired apoprotein. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium. Plant regeneration from cultured protoplasts is described in Evans et al. (1983) pp. 124-176 In: Protoplasts Isolation and Culture, Handbook of Plant Cell Culture, MacMillan Publishing Company, New York; and Binding, Regeneration of Plants, Plant Protoplasts, pp. 21-73; CRC Press, Boca Raton, 1985. Regeneration can also be obtained from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee et al. (1987) Ann. Rev. of Plant Phys. 38: 467-486.

[0087] One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.

[0088] Expression in Mammalian and Insect Cell Cultures

[0089] Illustrative of cell cultures useful for the production of the apoprotein polypeptides are cells of insect or mammalian origin. Mammalian cell systems often will be in the form of monolayers of cells although mammalian cell suspensions may also be used. Illustrative examples of mammalian cell lines include VERO and HeLa cells, Chinese hamster ovary (CHO) cell lines, W138, BHK, Cos-7 or MDCK cell lines.

[0090] When the host cell is of insect or mammalian origin illustrative expression control sequences are obtained from the SV-40 promoter (Science, 222:524-527, 1983), the CMV I.E. Promoter (Proc. Natl. Acad. Sci. 81:659-663, 1984) or the metallothionein promoter (Nature 296:39-42, 1982). The cloning vector containing the expression control sequences is cleaved using restriction enzymes and adjusted in size as necessary or desirable and ligated with DNA coding for the apoprotein polypeptides by means well known in the art.

[0091] Expression of Variant Apoprotein Polypeptides

[0092] The nucleotide sequences used to transfect the host cells described above and used for production of recombinant binding domain polypeptides can be modified according to standard techniques to yield polypeptides with a variety of desired properties. The binding domain polypeptides of the present invention can be readily designed and manufactured utilizing various recombinant DNA techniques well known to those skilled in the art. For example, the binding domain polypeptides can vary from the naturally-occurring sequence at the primary structure level by amino acid insertions, substitutions, deletions, and the like. These modifications can be used in a number of combinations to produce the final modified protein chain.

[0093] The amino acid sequence variants can be prepared with various objectives in mind, including facilitating purification and preparation of the recombinant polypeptide, biological stability, and/or fluorescence quantum yields of the adducts of the invention.

[0094] In general, modifications of the sequences encoding the apoprotein polypeptides may be readily accomplished by a variety of well-known techniques, such as site-directed mutagenesis (see, Giliman and Smith (1979) Gene 8:81-97 and Roberts et al. (1987) Nature 328:731-734), or chemical modification (Glazer et al. (1975) Pp, 205 in Chemical Modification of Proteins, Elsevier, New York).

[0095] One of ordinary skill will appreciate that the effect of many mutations is difficult to predict. Thus, most modifications are evaluated by routine screening in a suitable assay for the desired characteristic. A particularly useful assay using expression in the yeast, Pichia pastoris is described below and in the examples. For instance, this assay can be used to test random genetic approaches to identify ‘gain-of-function’ mutations which affect the spectroscopic properties of phytochrome.

[0096] Fluorescence-based screens of the phytochrome mutant expressing cell population are particularly useful in the Pichia system because these cells synthesize PΦB (Wu et al (1996) Proc. Natl. Acad. Sci. USA, 93: 8989-8994). In this way, mutations affecting the primary photochemical step in the conversion of Pr to Pfr (i.e. 15Z to 15E photoisomerization) will exhibit enhanced fluorescence. Fluorescence-activated cell sorting (FACS) is particularly useful in this assay. The introduction of a bulky amino acid side chain near the D-ring of the chromophore is one example of the type of mutation which can be isolated by this screen.

[0097] Specific amino acid residues important to chromophore-protein interactions in phytochrome can be identified. For instance, epitope-tagged versions of recombinant phytochromes derived from higher plants (i.e. AsphyA-ST and AtphyA-ST), the green alga Mesotaenium caldariorum (i.e. Mcphyl1b-ST) and the cyanobacterium Synechocystis sp. PCC6803 (i.e. S6803phy1-ST)—all four of which have been successfully expressed and assembled with bilins can be used to identify useful variants.

[0098] Phytochromes can be used in these methods. The HPLC analyses are greatly simplified by the use of a chromophore domain fragment. The expression and purification of such mutants of AsphyA or Mcphy1b is based on chromophore domain mutant expression studies of other species (see, e.g., Deforce et al. (1991) Proc Natl Acad Sci USA 88:10392-10396 and Schmidt et al. (1996) J. Photochem. Photobiol., B: Biology 34: 73-77.

[0099] In one embodiment, a preferred apoprotein consists of the chromophore domain; the N terminus of the apoprotein sufficient for lyase activity. In a particularly preferred embodiment, the apoprotein consists of the minimal chromophore domain. Such minimal domains are readily determined by performing apoprotein truncations and assaying the ability of the aproprotein to reassemble with an added bilin as described herein. One such shortened apoprotein consists of 197 amino acids, described in the Examples.

[0100] Particular amino acid sites can also be modified. One such site is the chromophore binding site cysteine residue (i.e. cys₃₂₂ of AsphyA, cys₃₂₄ of Mcphy1b, or cys₂₅₉ of Cph1). These residues can be modified with a sulfhydryl-specific bifunctional photoaffinity crosslinking reagent such as p-azidophenacyl bromide or N(4-azidophenylthio)phthalimide (APTP). The bifunctional photoaffinity crosslinking reagent will be introduced into the molecule via reaction with cys₃₂₂ followed by UV crosslinklng. Having identified putative chromophore binding site residues with this approach, saturation site-specific mutagenesis experiments will be undertaken to evaluate the importance of these residues to bilin attachment, photoactivity and/or holoprotein conformation. Control experiments to determine whether chemical modification grossly alters the apoprotein's conformation will also be performed with each sulfhydryl reagent. Other residues can be modified as shown in Example 2. Some residues are required for bilin binding and therefore must be conserved in modified apoproteins of the invention. For instance, as shown in Example 2, modification of the glutamic acid residue at position 189 of Cph1 abolished bilin binding.

[0101] Based on multiple sequence alignment of the chromophore domains of phytochromes directed mutagenesis can be carried out. In one embodiment, a “chemical rescue” approach can be employed to help distinguish specific local effects from gross structural perturbations caused by individual mutations (see, Toney et al. (1989) Science 243:1485-1488). Using this technique, site-directed mutations at conserved arg and trp residues can be introduced within the chromophore domain of phytochrome. Arg₂₃₇ of AsphyA is an example of a good target for mutagenesis because it is the only conserved arg residue in the chromophore domain, and thus is a potential candidate for tethering the propionic acid side chains of the bilin chromophore.

[0102] A similar approach is used to examine the importance of conserved tryptophan residues, beginning with the two universally invariant trp₃₆₆ and trp₄₇₅ of AsphyA. In this case, chemical rescue will employ indole prosthesis. The potential importance of trp residues to the phytochrome photocycle has already been implicated by resonance Raman spectroscopy (see, e.g., Mizutani et al. (1991) Biochemistry 30:10693-10700).

[0103] Other cysteine residues in the chromophore domain can also be mutagenized. First, there are relatively few cysteine residues in phytochrome with as few as 6 cysteines in the chromophore domain of S6803phy1. In addition, aside from the site of chromophore attachment, only one other cysteine (i.e. cys₃₈₇ on AsphyA) is found on almost all of the known phytochromes, the notable exception being rcaE. This suggests that most, if not all of the cysteines are dispensable, and could be substituted with isosteric serine residues without any significant structural or functional effect. For instance, the five cysteine residues in the chromophore domain of S6803phy1 can be substituted with serine residues. The photochemical properties of these mutant constructs can be examined to ascertain if the absorption coefficient, photoequilibrium and/or photochemical quantum yields are altered by mutagenesis. Another preferred embodiment is a cysteine-deficient (except for cys₂₅₉ of S6803phy1), photoactive phytochrome mutant. This mutant is particularly useful for structural studies such as crosslinking experiments proposed above. Moreover, re-introduction of cysteine residues at selected positions in this cysteine-deficient mutant can be used for structural analyses, and for specific cross-linking to other macromolecules.

[0104] Preparation of bilins

[0105] The bilin component of the adducts of the invention can be isolated from the appropriate natural source or synthesized according to techniques known in the art. Methods for synthesis of the dimethyl ester of phytochromobilin are described for instance in Weller et al. (1980) Chem. Ber. 113:1603-1611. Conversion of the dimethyl ester to the free acid can be accomplished according to known techniques (see, e.g., Greene and Wuts, Protective Groups in Organic Synthesis 2d ed. (John Wiley and Sons, 1991).

[0106] Methods for isolating bilins including phytochromobilin, phycocyanobilin (PCB), and phycoerythrobilin (PEB) from natural sources are also described in the art. For instance crude phycocyanobilin can be prepared from Spirulina platensis as described by Terry et al. (1993) J. Biol. Chem. 268:26099-26106. Crude phytochromobilin and PEB can be prepared by methanolysis of Porphyridium cruentum cells as described by Cornejo et al. (1992) J. Biol. Chem. 267: 14790-14798. The structures of phytochromobilin, PCB, and PEB are shown in FIG. 1 of WO 98/04700.

[0107] Attachment of fluorescent adducts to subject molecules.

[0108] Tagged moiety.

[0109] The conjugates of the subject invention are fluorescent adducts bound either covalently or non-covalently, normally covalently, to a particular moiety to be detected. Virtually any moiety to which it is desired to attach a fluorescent label is suitable. The moiety can be a macroscopic article such as an article of manufacture that is to be fluorescently tagged, or alternatively, the moiety can be microscopic, such as cell, an organelle, or a single molecule.

[0110] Again, virtually any molecule can be tagged. Typically, however, the moiety to be tagged and detected will be a biomolecule such as a polypeptide, oligopeptide, nucleic acid, polysaccharide, oligosaccharide, lipid, and the like. For instance, the subject molecule may be a ligand or receptor. A “ligand”, as used herein, refers generally to all molecules capable of reacting with or otherwise recognizing or binding to a second biological macromolecule e.g., a receptor, antigen, or other molecule on a target cell. Specifically, examples of ligands include, but are not limited to antibodies, lymphokines, cytokines, receptor proteins (e.g., CD4, CD8), solubilized receptor proteins (e.g., solubilized T-cell receptor, soluble CD4), hormones, growth factors, and the like which specifically bind particular target cells. A “growth factor” as used herein refers to a protein ligand that stimulates cell division or differentiation or inhibits cell division or stimulates or inhibits a biological response like motility or secretion of proteins. Growth factors are well known to those of skill in the art and include, but are not limited to, platelet-derived growth factor (PDGF), epidermal growth factor (EGF), insulin-like growth factor (IGF), transforming growth factor β (TGF-β), fibroblast growth factors (FGF), interleukin 2 (IL2), nerve growth factor (NGF), interleukin 3 (IL3), interleukin 4 (IL4), interleukin 1 (IL1), interleukin 6 (IL6),interleukin 7 (IL7), granulocyte/macrophage colony-stimulating factor (GM-CSF), granulocyte colony-stimulating factor (G-CSF), macrophage colony-stimulating factor (M-CSF), erythropoietin and the like. One of skill in the art recognizes that the term growth factor as used herein generally includes cytokines and colony stimulating factors.

[0111] Attachment of the phytofluor to the moiety.

[0112] The proteinaceous portions of the fluorescent adducts (phytofluors) referred to here as the apoproteins provide a wide range of functional groups for conjugation to proteinaceous and non-proteinaceous molecules. Functional groups which are present include, but are not limited to amino, thio, hydroxyl, and carboxy. In some instances, it may be desirable to introduce, delete, or modify functional groups, particularly thio groups where the apoprotein is to be conjugated to another protein.

[0113] Depending upon the nature of the molecule (e.g., member of a specific binding pair) to be conjugated to the phytofluor complex, the ratio of the two moieties will vary widely, where there may be a plurality of subject molecules to one phytofluor or apoprotein or, conversely, where there may be a plurality of phytofluors or apoproteins to one subject molecule. Of course, the molar ratio of the molecule (moiety) to be labeled to the phytofluor or apoprotein may be about 1:1. In addition, in some instances, initial intermediates are formed by covalently conjugating a small ligand to a fluorescent adduct and then forming a specific binding pair complex with the complementary receptor, where the receptor then serves as a ligand or receptor in a subsequent complex or is itself covalently attached to a ligand or receptor intended for use in a subsequent complex.

[0114] The procedure for attaching a subject molecule to the phytofluor or an apoprotein of the fluorescent adduct will vary according to the chemical structure of the agent. As indicated above, the apoproteins contain a variety of functional groups (e.g.,—OH,—COOH,—SH, or —NH₂) groups, which are available for reaction with a suitable functional group on an agent molecule to bind the agent thereto. Alternatively, the apoprotein may be derivatized to expose or attach additional reactive functional groups. The derivatization may involve attachment of any of a number of linker molecules such as those available from Pierce Chemical Company, Rockford Ill. A bifunctional linker having one functional group reactive with a group on a particular agent, and another group reactive with an antibody, may be used to form the desired immunoconjugate.

[0115] Alternatively, derivatization may involve chemical treatment of the antibody; e.g., glycol cleavage of the sugar moiety of the glycoprotein antibody with periodate to generate free aldehyde groups. The free aldehyde groups on the antibody may be reacted with free amine or hydrazine groups on an agent to bind the agent thereto (see, e.g., U.S. Pat. No. 4,671,958). Procedures for generation of free sulfhydryl groups on antibodies or antibody fragments are also known (see, e.g., U.S. Pat. No. 4,659,839). Many procedure and linker molecules for attachment of various compounds including radionuclide metal chelates, toxins and drugs to proteins (e.g., to antibodies) are known. See, for example, European Patent Application No. 188,256; U.S. Pat. Nos. 4,671,958, 4,659,839, 4,414,148, 4,699,784; 4,680,338; 4,569,789; and 4,589,071; and Borlinghaus et al. (1987) Cancer Res. 47: 4071-4075).

[0116] Linking agents suitable for joining the adducts of this invention to nucleic acids are also well known. For example, linking agents which are specific to the free secondary hydroxyl normally present at the 3′ end include phosphites, succinic anhydride and phthalamide. Linking agents which are specific to the phosphate normally present on the sugar at the 5′ end (at least for most naturally occurring polynucleotides or products of most cleavage reactions) include carbodiimides such as 1-ethyl-,3′dimethylamino propylcarbodiimide, with or without imidazole or 1-methylimidazole. See Chu et al. (1983) Nucleic Acids Res. 11: 6513-6529.

[0117] Use of apoproteins as affinity chromatography reagents

[0118] In addition to use in preparation of phytofluors, the apoproteins of the invention can be used in standard affinity chromatography methods to isolate desired compounds. For instance, as noted above, the apoproteins of the invention form a covalent bond with the bilin component of phytochromes and phytofluors, this interaction can be used to isolate and characterize novel bilins from plant materials. Thus, the apoproteins (e.g. those consisting of the chromophore domain) can be attached to a solid surface (e.g. beads) according to standard techniques (e.g. via a genetically engineered affinity peptide tag) and used as affinity chromatography reagent in standard isolation techniques.

[0119] Alternatively, nucleic acids encoding apoproteins can be used to prepare fusion proteins to identify proteins that interact with the fusion partner. By attaching a bilin to a solid matrix in an appropriate manner, chromophore domain-containing test proteins could be ‘captured’ enabling the isolation of proteins that interact with them.

[0120] Apoprotein-containing kits.

[0121] In one embodiment this invention provides kits utilizing the labels and/or binding domains of this invention. The kits preferably include one or more of the apoproteins of this invention and/or one or more nucleic acids encoding the apoproteins. Where the kits are intended for labeling, the kits can additionally include a one or more bilins that form fluorescent adduct(s) with the apoprotein, and/or appropriate reagents for coupling the apoprotein and/or bilin to the moiety that is to be labeled.

[0122] In another embodiment the kits can include nucleic acids encoding the apoprotein. The nucleic acid can be a nucleic acid vector containing appropriate restriction sites to faciliate insertion of a heterologous DNA such that the vector expresses the a heterologous polypeptide in fusion with the apoprotein.

[0123] In still another embodiment, the kits contain apoproteins for creation of an affinity column. In this instance, the kits can additionally include an affinity matrix to which the apoprotein can be bound, or the apoprotein can be provided already bound to a solid support (e.g. to polymeric beads or other materials).

[0124] The kits may optionally contain any of the buffers, reagents, and/or media that are useful for the practice of the methods of this invention.

[0125] In addition, the kits may include instructional materials containing directions (i.e., protocols) for the practice of the methods of this invention. While the instructional materials typically comprise written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media include, but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the like. Such media may include addresses to internet sites that provide such instructional materials.

EXAMPLES

[0126] The following examples are offered to illustrate, but not to limit the present invention.

Example 1 New Apophytochromes From Synechocystis

[0127] WO 98/04700 shows that incubation of phytochrome apoproteins with the linear tetrapyrrole phycoerythrin produces intensely fluorescent protein complexes known as phytofluors (see, also, Murphy et al Curr. Biol. 7:870-876 (1997). FIG. 6 of WO 98/04700 presents a sequence alignment of members of the apophytochrome family including eukaryotic members and one prokarotic member named S6803phy1 or cph1. This example shows that other apophytochromes genes are present in the cyanobacterium, Synechocystis sp. PCC 6803. The encoded polypeptides covalently associate with bilins to produce biliproteins both with phytochrome-like spectroscopic properties and others with significantly altered absorption spectra.

[0128] This example provides further data that define the chromophore-binding domain on the phytochrome ‘superfamily’ of related proteins which is required for covalent binding of the bilin prosthetic group (the chromophore domain). This chromophore domain has been delimited to a region roughly 200 amino acids in length.

[0129] The multiple sequence alignment shown in FIG. 6 of WO 98/04700 was used to construct a protein profile (as described by Gribskov, M. & Veretnik, S.: Identification of sequence pattern with profile analysis in Methods in Enzymology. 1996:198-212). for profilesearching of the Cyanobase cyanobacterial database (Cyanobase Web Site. http://www.kazusa.or.jp/cyano/).

[0130] Based upon this search, seven additional apophytochrome protein sequences (i.e. cph2-8) were identified—one of which (cph2) has two phytochrome-related domains. The genetic loci of these sequences in the genome of the cyanobacterium Synechocystis sp PCt6803 are shown in Table 1. TABLE 1 Phytochrome-Related Sequences in the Genome of Synechocystis sp PCC6803. Gene ID Cyanobase Locus Protein Length (aa) Cph1 SLR0473  748 aa Cph2 SLL0821 1276 aa Cpl1 (Cph3) SLL1473  481 aa Cpl2 (Cph4) SLL1124 1372 aa Cpl3 (Cph5) SLL0041  891 aa Cpl4 (Cph6) SLR1212  844 aa Cpl5 (Cph7) SLR1393  950 aa Cpl6 (Cph8) SLR1969  750 aa

[0131]FIG. 1 shows the putative structural organization of the predicted protein products of these open reading frames. The roughly 200 amino acid chromophore domain of these proteins which include the cysteine site of bilin attachment is shown with an asterisk. Two major subfamilies have been defined—the first subfamily consisting of higher plant phytochrome-like domains, which include cph1, and the N-terminal domain of cph2 (i.e cph2a), and a second subfamily consisting of cp1 (phytochrome-like) 1-6 (also referred to as cph3-8) and the second domain of cph2 (i.e. cph2b) which have a roughly 30 amino acid deletion adjacent to the predicted cysteine site for bilin attachment. A multiple sequence alignment of the chromophore domains of representative eukaryotic phytochromes which include the Arabidopsis phyA, phyB/D, phyC and phyE proteins and the green algal phytochrome (Mcphy1b) and the cyanobacterial phytochrome-related sequences, shown in FIG. 2 was obtained using the Wisconsin Genetics Computing Group program PILEUP. This multiple sequence alignment defines the bilin chromophore binding domain of the phytochrome superfamily.

[0132] To demonstrate this, DNA fragments that encompass the chromophore-related domains of cph1, cph2a, cph2b and cph8 were isolated by PCR, cloned these into an expression vector, and tested as described in WO 98/04700. These experiments confirmed that cph1, cph1 (N514 deletion mutant), and cph2a (N390, consisting of the first 390 amino acid residues) all yield photochemically active phytochrome-like species upon incubation with PCB. Cph2a(N390), cph2b(C423) and cph8 polypeptides also covalently bind PCB. The PCB adducts of cph2b and cph 8 are both photochemically inactive and absorb mostly in the blue wavelength region. This shows that the cph2b/cph3/cph4/cph5/cph6/cph7/cph8 phytochrome ‘subfamily’ is able to catalyze bilin attachment and, like cph1/cph2a, they are apophytochromes. This work thus confirms that the structural motif illustrated in FIG. 2 represents the bilin-binding domain of the ‘phytochrome superfamily’ from which new phytofluors of different colors can be devised. Two subfamilies of cyanobacterial ‘phytochromes’ defined by cph1/2a and cph2b/cph3/cph4/cph5/cph6/cph7/cph8 which respectively exhibit red-far-red photoreversible and blue light absorbing non-photoreversible molecules have also been defined.

Example 2 Further Definition of the Chromophore Domain

[0133] This example presents experimental data demonstrating that a phytofluor can be produced by expression of a apoprotein polypeptides of only 197 amino acids in length. This result defines a roughly 200 amino acid bilin binding domain motif that is sufficient for covalent attachment of a bilin chromophore precursor to produce photoactive adducts (ie. phytochrome) and fluorescent adducts (ie. phytofluors). In addition, mutagenesis of some of the fully conversed residues within this domain of cph1 has confirmed the importance of these residues for bilin assembly and spectroscopic properties of the resulting bilin adducts.

[0134] Example 1 provides evidence of two class of apophytochromes. The first class possess a 200 amino acid protein motif termed ‘module 1’ that is similar to eukaryotic phytochromes. Members of this class include Cph1 and Cph2 (ie. a region encompassing the N-terminal 390 amino acids), that both yield red, far-red photoreversible phycocyanobilin (PCB) adducts. The second class, typified by Cph8 and the C-terminal 423 amino acid region of Cph2, afford ‘non-photoactive’ biliproteins upon incubation with PCB. Members of this class, possess a 200 amino acid domain (ie. module 2) that is distinguished from the first class by a conspicuous deletion of 17-22 amino acids near the cysteine attachment site.

[0135] Based on the multiple sequence alignment shown in FIG. 2, and using methods described in W098/04700, a bacterial expression vector express a Strep-Tagged version of N-terminal 197 amino acid region of Cph2 (Cph2-N197) that encompassed ‘module 1’ only was constructed (see, FIG. 3). E. coli cells containing this plasmid produced the Strep-Tagged protein of the expected size (ie. 25 kDa), that yielded a covalent biliprotein adduct upon incubation with PCB as detected by zinc blot visualization Purification of the PCB-Cph2 adduct yielded a protein possessing phytochrome-like optical properties including a red minus far-red difference spectra. PEB treatment of extracts containing cph2-N197 also produced a fluorescent species with excitation and emission properties nearly indistinguishable from other phytofluors. Taken together, these experiment demonstrate that module 1 region of cph2 is competent to covalently bind bilins and therefore represents a functional chromophore domain.

[0136] To address the importance of conserved residues in the bilin lyase domain of Cph1 to bilin attachment and spectral properties of its bilin adducts, site-directed mutagenesis of some of these residues were undertaken. Charged residues D171, R172, E189 and R222 of Cph1 (N514) were mutagenized in the initial studies. The results of these experiments are summarized in Table 2. These experiments show that E189 is required for bilin binding, since its conversion to the amino acids A, G, K or T produces an apoprotein that is incapable of bilin binding. By contrast, single missense mutations of redisues D171, R172 and R222 did not abolish bilin binding, but instead lead to phytochromes with altered spectral properties. Double mutants of residues D171 and R172 also failed to bind bilin. Taken together, these experiments underscore the importance of these residues to the photosensory activity of phytochrome, and show that mutagenesis of the bilin lyase domain can be used for ‘tuning’ the photochemical and/or fluorescence properties of apophytochrome-bilin adducts. TABLE 2 Bilin Binding and Photochemical Activity of PCB Adducts of Sited- Directed Mutants of Cph1 (N514). PCB Difference Binding Photoreveribility Spectrum Single Mutants Cph1 (N514) Wild Type yes yes WT Cph1 (N514) D171A yes yes WT Cph1 (N514) yes yes blue shifted R172G/A Cph1 (N514) E189A, no no none G K or T Cph1 (N514) R222G yes yes blue shifted Double Mutants Cph1 (N514) no no none DR171/2AA Cph1 (N514) no no none DR171/2AG

[0137] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference for all purposes.

1 24 1 748 PRT Unknown Description of Unknown Organism Locus SLR0473 = S6803PHY1 = SYN_PHY 1 Met Ala Thr Thr Val Gln Leu Ser Asp Gln Ser Leu Arg Gln Leu Glu 1 5 10 15 Thr Leu Ala Ile His Thr Ala His Leu Ile Gln Pro His Gly Leu Val 20 25 30 Val Val Leu Gln Glu Pro Asp Leu Thr Ile Ser Gln Ile Ser Ala Asn 35 40 45 Cys Thr Gly Ile Leu Gly Arg Ser Pro Glu Asp Leu Leu Gly Arg Thr 50 55 60 Leu Gly Glu Val Phe Asp Ser Phe Gln Ile Asp Pro Ile Gln Ser Arg 65 70 75 80 Leu Thr Ala Gly Gln Ile Ser Ser Leu Asn Pro Ser Lys Leu Trp Ala 85 90 95 Arg Val Met Gly Asp Asp Phe Val Ile Phe Asp Gly Val Phe His Arg 100 105 110 Asn Ser Asp Gly Leu Leu Val Cys Glu Leu Glu Pro Ala Tyr Thr Ser 115 120 125 Asp Asn Leu Pro Phe Leu Gly Phe Tyr His Met Ala Asn Ala Ala Leu 130 135 140 Asn Arg Leu Arg Gln Gln Ala Asn Leu Arg Asp Phe Tyr Asp Val Ile 145 150 155 160 Val Glu Glu Val Arg Arg Met Thr Gly Phe Asp Arg Val Met Leu Tyr 165 170 175 Arg Phe Asp Glu Asn Asn His Gly Asp Val Ile Ala Glu Asp Lys Arg 180 185 190 Asp Asp Met Glu Pro Tyr Leu Gly Leu His Tyr Pro Glu Ser Asp Ile 195 200 205 Pro Gln Pro Ala Arg Arg Leu Phe Ile His Asn Pro Ile Arg Val Ile 210 215 220 Pro Asp Val Tyr Gly Val Ala Val Pro Leu Thr Pro Ala Val Asn Pro 225 230 235 240 Ser Thr Asn Arg Ala Val Asp Leu Thr Glu Ser Ile Leu Arg Ser Ala 245 250 255 Tyr His Cys His Leu Thr Tyr Leu Lys Asn Met Gly Val Gly Ala Ser 260 265 270 Leu Thr Ile Ser Leu Ile Lys Asp Gly His Leu Trp Gly Leu Ile Ala 275 280 285 Cys His His Gln Thr Pro Lys Val Ile Pro Phe Glu Leu Arg Lys Ala 290 295 300 Cys Glu Phe Phe Gly Arg Val Val Phe Ser Asn Ile Ser Ala Gln Glu 305 310 315 320 Asp Thr Glu Thr Phe Asp Tyr Arg Val Gln Leu Ala Glu His Glu Ala 325 330 335 Val Leu Leu Asp Lys Met Thr Thr Ala Ala Asp Phe Val Glu Gly Leu 340 345 350 Thr Asn His Pro Asp Arg Leu Leu Gly Leu Thr Gly Ser Gln Gly Ala 355 360 365 Ala Ile Cys Phe Gly Glu Lys Leu Ile Leu Val Gly Glu Thr Pro Asp 370 375 380 Glu Lys Ala Val Gln Tyr Leu Leu Gln Trp Leu Glu Asn Arg Glu Val 385 390 395 400 Gln Asp Val Phe Phe Thr Ser Ser Leu Ser Gln Ile Tyr Pro Asp Ala 405 410 415 Val Asn Phe Lys Ser Val Ala Ser Gly Leu Leu Ala Ile Pro Ile Ala 420 425 430 Arg His Asn Phe Leu Leu Trp Phe Arg Pro Glu Val Leu Gln Thr Val 435 440 445 Asn Trp Gly Gly Asp Pro Asn His Ala Tyr Glu Ala Thr Gln Glu Asp 450 455 460 Gly Lys Ile Glu Leu His Pro Arg Gln Ser Phe Asp Leu Trp Lys Glu 465 470 475 480 Ile Val Arg Leu Gln Ser Leu Pro Trp Gln Ser Val Glu Ile Gln Ser 485 490 495 Ala Leu Ala Leu Lys Lys Ala Ile Val Asn Leu Ile Leu Arg Gln Ala 500 505 510 Glu Glu Leu Ala Gln Leu Ala Arg Asn Leu Glu Arg Ser Asn Ala Asp 515 520 525 Leu Lys Lys Phe Ala Tyr Ile Ala Ser His Asp Leu Gln Glu Pro Leu 530 535 540 Asn Gln Val Ser Asn Tyr Val Gln Leu Leu Glu Met Arg Tyr Ser Glu 545 550 555 560 Ala Leu Asp Glu Asp Ala Lys Asp Phe Ile Asp Phe Ala Val Thr Gly 565 570 575 Val Ser Leu Met Gln Thr Leu Ile Asp Asp Ile Leu Thr Tyr Ala Lys 580 585 590 Val Asp Thr Gln Tyr Ala Gln Leu Thr Phe Thr Asp Val Gln Glu Val 595 600 605 Val Asp Lys Ala Leu Ala Asn Leu Lys Gln Arg Ile Glu Glu Ser Gly 610 615 620 Ala Glu Ile Glu Val Gly Ser Met Pro Ala Val Met Ala Asp Gln Ile 625 630 635 640 Gln Leu Met Gln Val Phe Gln Asn Leu Ile Ala Asn Gly Ile Lys Phe 645 650 655 Ala Gly Asp Lys Ser Pro Lys Ile Lys Ile Trp Gly Asp Arg Gln Glu 660 665 670 Asp Ala Trp Val Phe Ala Val Gln Asp Asn Gly Ile Gly Ile Asp Pro 675 680 685 Gln Phe Phe Glu Arg Ile Phe Val Ile Phe Gln Arg Leu His Thr Arg 690 695 700 Asp Glu Tyr Lys Gly Thr Gly Met Gly Leu Ala Ile Cys Lys Lys Ile 705 710 715 720 Ile Glu Gly His Gln Gly Gln Ile Trp Leu Glu Ser Asn Pro Gly Glu 725 730 735 Gly Ser Thr Phe Tyr Phe Ser Ile Pro Ile Gly Asn 740 745 2 1276 PRT Unknown Description of Unknown Organismcph2 Locus SLL0821 2 Met Asn Pro Asn Arg Ser Leu Glu Asp Phe Leu Arg Asn Val Ile Asn 1 5 10 15 Lys Phe His Arg Ala Leu Thr Leu Arg Glu Thr Leu Gln Val Ile Val 20 25 30 Glu Glu Ala Arg Ile Phe Leu Gly Val Asp Arg Val Lys Ile Tyr Lys 35 40 45 Phe Ala Ser Asp Gly Ser Gly Glu Val Leu Ala Glu Ala Val Asn Arg 50 55 60 Ala Ala Leu Pro Ser Leu Leu Gly Leu His Phe Pro Val Glu Asp Ile 65 70 75 80 Pro Pro Gln Ala Arg Glu Glu Leu Gly Asn Gln Arg Lys Met Ile Ala 85 90 95 Val Asp Val Ala His Arg Arg Lys Lys Ser His Glu Leu Ser Gly Arg 100 105 110 Ile Ser Pro Thr Glu His Ser Asn Gly His Tyr Thr Thr Val Asp Ser 115 120 125 Cys His Ile Gln Tyr Leu Leu Ala Met Gly Val Leu Ser Ser Leu Thr 130 135 140 Val Pro Val Met Gln Asp Gln Gln Leu Trp Gly Ile Met Ala Val His 145 150 155 160 His Ser Lys Pro Arg Arg Phe Thr Glu Gln Glu Trp Glu Thr Met Ala 165 170 175 Leu Leu Ser Lys Glu Val Ser Leu Ala Ile Thr Gln Ser Gln Leu Ser 180 185 190 Arg Gln Val His Gln Gln Gln Val Gln Glu Ala Leu Val Gln Arg Leu 195 200 205 Glu Thr Thr Val Ala Gln Tyr Gly Asp Arg Pro Glu Thr Trp Gln Tyr 210 215 220 Ala Leu Glu Thr Val Gly Gln Ala Val Glu Ala Asp Gly Ala Val Leu 225 230 235 240 Tyr Ile Ala Pro Asp Leu Thr Gly Ser Val Ala Gln His Tyr Gln Trp 245 250 255 Asn Leu Arg Phe Asp Trp Gly Asn Trp Leu Glu Thr Ser Leu Trp Gln 260 265 270 Glu Leu Met Arg Gly Gln Pro Ser Ala Ala Met Glu Pro Met Ala Ala 275 280 285 Val Gln Ser Thr Trp Glu Lys Pro Arg Pro Phe Thr Ser Val Ala Pro 290 295 300 Leu Pro Pro Thr Asn Cys Val Pro His Gly Tyr Thr Leu Gly Glu Leu 305 310 315 320 Glu Gln Arg Ser Asp Trp Ile Ala Pro Pro Glu Ser Leu Ser Ala Glu 325 330 335 Asn Phe Gln Ser Phe Leu Ile Val Pro Leu Ala Ala Asp Gln Gln Trp 340 345 350 Val Gly Ser Leu Ile Leu Leu Arg Lys Glu Lys Ser Leu Val Lys His 355 360 365 Trp Ala Gly Lys Arg Gly Ile Asp Arg Arg Asn Ile Leu Pro Arg Leu 370 375 380 Ser Phe Glu Ala Trp Glu Glu Thr Gln Lys Leu Val Pro Thr Trp Asn 385 390 395 400 Arg Ser Glu Arg Lys Leu Ala Gln Val Ala Ser Thr Gln Leu Tyr Met 405 410 415 Ala Ile Thr Gln Gln Phe Val Thr Arg Leu Ile Thr Gln Gln Thr Ala 420 425 430 Tyr Asp Pro Leu Thr Gln Leu Pro Asn Trp Ile Ile Phe Asn Arg Gln 435 440 445 Leu Thr Leu Ala Leu Leu Asp Ala Leu Tyr Glu Gly Lys Met Val Gly 450 455 460 Val Leu Val Ile Ala Met Asp Arg Phe Lys Arg Ile Asn Glu Ser Phe 465 470 475 480 Gly His Lys Thr Gly Asp Gly Leu Leu Gln Glu Val Ala Asp Arg Leu 485 490 495 Asn Gln Lys Leu Ser Pro Leu Ala Ala Tyr Ser Pro Leu Leu Ser Arg 500 505 510 Trp His Gly Asp Gly Phe Thr Ile Leu Leu Thr Gln Ile Ser Asp Asn 515 520 525 Gln Glu Met Ile Pro Leu Cys Glu Arg Leu Leu Ser Thr Phe Gln Glu 530 535 540 Pro Phe Phe Leu Gln Gly Gln Pro Ile Tyr Leu Thr Ala Ser Met Gly 545 550 555 560 Ile Ser Thr Ala Pro Tyr Asp Gly Glu Thr Ala Glu Ser Leu Leu Lys 565 570 575 Phe Ala Glu Ile Ala Leu Thr Arg Ala Lys Cys Gln Gly Lys Asn Thr 580 585 590 Tyr Gln Phe Tyr Arg Pro Gln Asp Ser Ala Pro Met Leu Asp Arg Leu 595 600 605 Thr Leu Glu Ser Asp Leu Arg Gln Ala Leu Thr Asn Gln Glu Phe Val 610 615 620 Leu Tyr Phe Gln Pro Gln Val Ala Leu Asp Thr Gly Lys Leu Leu Gly 625 630 635 640 Val Glu Ala Leu Val Arg Trp Gln His Pro Arg Leu Gly Gln Val Ala 645 650 655 Pro Asp Val Phe Ile Pro Leu Ala Glu Glu Leu Gly Leu Ile Asn His 660 665 670 Leu Gly Gln Trp Val Leu Glu Thr Ala Cys Ala Thr His Gln His Phe 675 680 685 Phe Arg Glu Thr Gly Arg Arg Leu Arg Met Ala Val Asn Ile Ser Ala 690 695 700 Arg Gln Phe Gln Asp Glu Lys Trp Leu Asn Ser Val Leu Glu Cys Leu 705 710 715 720 Lys Arg Thr Gly Met Pro Pro Glu Asp Leu Glu Leu Glu Ile Thr Glu 725 730 735 Ser Leu Met Met Glu Asp Ile Lys Gly Thr Val Val Leu Leu His Arg 740 745 750 Leu Arg Glu Glu Gly Val Gln Val Ala Ile Asp Asp Phe Gly Thr Gly 755 760 765 Tyr Ser Ser Leu Ser Ile Leu Lys Gln Leu Pro Ile His Arg Leu Lys 770 775 780 Ile Asp Lys Ser Phe Val Asn Asp Leu Leu Asn Glu Gly Ala Asp Thr 785 790 795 800 Ala Ile Ile Gln Tyr Val Ile Asp Leu Ala Asn Gly Leu Asn Leu Glu 805 810 815 Thr Val Ala Glu Gly Ile Glu Ser Glu Ala Gln Leu Gln Arg Leu Gln 820 825 830 Lys Met Gly Cys His Leu Gly Gln Gly Tyr Phe Leu Thr Arg Pro Leu 835 840 845 Pro Ala Glu Ala Met Met Thr Tyr Leu Tyr Tyr Pro Gln Ile Leu Asp 850 855 860 Phe Gly Pro Thr Pro Pro Leu Pro Lys Val Ala Leu Pro Glu Thr Glu 865 870 875 880 Thr Glu Ala Gly Gln Gly Asn Val Gly Asp Arg Pro Leu Pro Asn Ser 885 890 895 Leu Asn Arg Glu Asn Pro Trp Thr Glu Lys Leu His Asp Tyr Val Leu 900 905 910 Leu Lys Glu Arg Leu Gln Gln Arg Asn Val Lys Glu Lys Leu Val Leu 915 920 925 Lys Ile Ala Asn Lys Ile Arg Ala Ser Leu Asn Ile Asn Asp Ile Leu 930 935 940 Tyr Ser Thr Val Thr Glu Val Arg Gln Phe Leu Asn Thr Asp Arg Val 945 950 955 960 Val Leu Phe Lys Phe Asn Ser Gln Trp Ser Gly Gln Val Val Thr Glu 965 970 975 Ser His Asn Asp Phe Cys Arg Ser Ile Ile Asn Asp Glu Ile Asp Asp 980 985 990 Pro Cys Phe Lys Gly His Tyr Leu Arg Leu Tyr Arg Glu Gly Arg Val 995 1000 1005 Arg Ala Val Ser Asp Ile Glu Lys Ala Asp Leu Ala Asp Cys His Lys 1010 1015 1020 Glu Leu Leu Arg His Tyr Gln Val Lys Ala Asn Leu Val Val Pro Val 1025 1030 1035 1040 Val Phe Asn Glu Asn Leu Trp Gly Leu Leu Ile Ala His Glu Cys Lys 1045 1050 1055 Thr Pro Arg Tyr Trp Gln Glu Glu Asp Leu Gln Leu Leu Met Glu Leu 1060 1065 1070 Ala Thr Gln Val Ala Ile Ala Ile His Gln Gly Glu Leu Tyr Glu Gln 1075 1080 1085 Leu Glu Thr Ala Asn Ile Arg Leu Gln Gln Ile Ser Ser Leu Asp Ala 1090 1095 1100 Leu Thr Gln Val Gly Asn Arg Tyr Leu Phe Asp Ser Thr Leu Glu Arg 1105 1110 1115 1120 Glu Trp Gln Arg Leu Gln Arg Ile Arg Glu Pro Leu Ala Leu Leu Leu 1125 1130 1135 Cys Asp Val Asp Phe Phe Lys Gly Phe Asn Asp Asn Tyr Gly His Pro 1140 1145 1150 Ala Gly Asp Arg Cys Leu Lys Lys Ile Ala Asp Ala Met Ala Lys Val 1155 1160 1165 Ala Lys Arg Pro Thr Asp Leu Val Ala Arg Tyr Gly Gly Glu Glu Phe 1170 1175 1180 Ala Ile Ile Leu Ser Glu Thr Ser Leu Glu Gly Ala Ile Asn Val Thr 1185 1190 1195 1200 Glu Ala Leu Gln Val Glu Val Ala Asn Leu Ala Ile Pro His Thr Val 1205 1210 1215 Ser Gly Thr Gly His Val Thr Leu Ser Ile Gly Ile Ala Val Tyr Thr 1220 1225 1230 Pro Glu Arg His Ile Asn Pro Asn Ala Leu Val Lys Ala Ala Asp Leu 1235 1240 1245 Ala Leu Tyr Glu Ala Lys Ala Lys Gly Arg Asn Gln Trp Leu Ala Tyr 1250 1255 1260 Glu Gly Ser Gln Leu Pro His Val Asp Gly Glu Val 1265 1270 1275 3 481 PRT Unknown Description of Unknown Organismcph Lucus SLL1473 a 297 aa histidine kinase homolog 3 Met Gly Lys Phe Leu Ile Pro Ile Glu Phe Val Phe Leu Ala Ile Ala 1 5 10 15 Met Thr Cys Tyr Leu Trp His Arg Gln Asn Gln Glu Arg Arg Arg Ile 20 25 30 Glu Ile Ser Ile Lys Gln Gln Thr Gln Arg Glu Arg Phe Ile Asn Gln 35 40 45 Ile Thr Gln His Ile Arg Gln Ser Leu Asn Leu Glu Thr Val Leu Asn 50 55 60 Thr Thr Val Ala Glu Val Lys Thr Leu Leu Gln Val Asp Arg Val Leu 65 70 75 80 Ile Tyr Arg Ile Trp Gln Asp Gly Thr Gly Ser Ala Ile Thr Glu Ser 85 90 95 Val Asn Ala Asn Tyr Pro Ser Ile Leu Gly Arg Thr Phe Ser Asp Glu 100 105 110 Val Phe Pro Val Glu Tyr His Gln Ala Tyr Thr Lys Gly Lys Val Arg 115 120 125 Ala Ile Asn Asp Ile Asp Gln Asp Asp Ile Glu Ile Cys Leu Ala Asp 130 135 140 Phe Val Lys Gln Phe Gly Val Lys Ser Lys Leu Val Val Pro Ile Leu 145 150 155 160 Gln His Asn Arg Ala Ser Ser Leu Asp Asn Glu Ser Glu Phe Pro Tyr 165 170 175 Leu Trp Gly Leu Leu Ile Thr His Gln Cys Ala Phe Thr Arg Pro Trp 180 185 190 Gln Pro Trp Glu Val Glu Leu Met Lys Gln Leu Ala Asn Gln Val Ala 195 200 205 Ile Ala Ile Gln Gln Ser Glu Leu Tyr Glu Gln Leu Gln Gln Leu Asn 210 215 220 Lys Asp Leu Glu Asn Arg Val Glu Lys Arg Thr Gln Gln Leu Ala Ala 225 230 235 240 Thr Asn Gln Ser Leu Arg Met Glu Ile Ser Glu Arg Gln Lys Thr Glu 245 250 255 Ala Ala Leu Arg His Thr Asn His Thr Leu Gln Ser Leu Ile Ala Ala 260 265 270 Ser Pro Arg Gly Ile Phe Thr Leu Asn Leu Ala Asp Gln Ile Gln Ile 275 280 285 Trp Asn Pro Thr Ala Glu Arg Ile Phe Gly Trp Thr Glu Thr Glu Ile 290 295 300 Ile Ala His Pro Glu Leu Leu Thr Ser Asn Ile Leu Leu Glu Asp Tyr 305 310 315 320 Gln Gln Phe Lys Gln Lys Val Leu Ser Gly Met Val Ser Pro Ser Leu 325 330 335 Glu Leu Lys Cys Gln Lys Lys Asp Gly Ser Trp Ile Glu Ile Val Leu 340 345 350 Ser Ala Ala Pro Leu Leu Asp Ser Glu Glu Asn Ile Ala Gly Leu Val 355 360 365 Ala Val Val Ala Asp Ile Thr Glu Gln Lys Arg Gln Ala Glu Gln Ile 370 375 380 Arg Leu Leu Gln Ser Val Val Val Asn Thr Asn Asp Ala Val Val Ile 385 390 395 400 Thr Glu Ala Glu Pro Ile Asp Asp Pro Gly Pro Arg Ile Leu Tyr Val 405 410 415 Asn Glu Ala Phe Thr Lys Ile Thr Gly Tyr Thr Ala Glu Glu Met Leu 420 425 430 Gly Lys Thr Pro Arg Val Leu Gln Gly Pro Lys Thr Ser Arg Thr Glu 435 440 445 Leu Asp Arg Val Arg Gln Ala Ile Ser Gln Trp Gln Ser Val Thr Val 450 455 460 Glu Ala Glu Val Leu Asn Asp Ser Tyr Lys Glu Lys Lys Ser Pro Leu 465 470 475 480 Lys 4 1371 PRT Unknown Description of Unknown Organismcph4 locus SLL1124 (DivJ homolog PAS domain) a 1372 aa protein that is more similar to rcaE than to cph1 4 Met Thr Phe Ala Ala Thr Pro Arg Glu Val Thr Ala Ser Ala Ile Gln 1 5 10 15 Trp Ala Cys Leu Cys Leu Pro Gly Glu Leu Ser Ala Ala Glu Ala Leu 20 25 30 Asn Arg Trp His Arg His Gly Gln Arg Ser Trp Glu Pro Pro Ala Glu 35 40 45 Ala Lys Ala Phe Pro Pro Trp Ala Leu Val Leu Asp Asn Asp Gly Gln 50 55 60 Leu Leu Gly Leu Leu Pro Asp Trp Gln Leu Ala Ala Ala Leu Trp Thr 65 70 75 80 Glu His Phe Ser Pro Ala Ile Ala Leu Ala Glu Leu Cys Leu Pro Cys 85 90 95 Ser Leu Arg Leu Asp Leu Glu Lys Leu Pro Ser Leu Gly Glu Val Met 100 105 110 Gln Ile Phe Ala Thr Trp Gly Tyr Gly Trp Asp Val Ile Pro Val Ala 115 120 125 Asp Arg Gln His Gln Thr Trp Gly Leu Leu Ser Ile Gly Asn Leu Ile 130 135 140 Arg Ser Val Asn Leu Cys Gln Leu Trp Gln Asn Leu Pro Leu Gln Val 145 150 155 160 Thr Ala Ser Pro Pro Leu Cys Leu Gly Thr Glu Thr Thr Leu Gly Glu 165 170 175 Leu Val His His Cys Phe Glu Arg Gln Ile Ser Ser Phe Pro Val Val 180 185 190 Tyr Ser Ser Pro Leu Leu Pro Ala Ala Ala Pro Arg Ile Pro Leu Gly 195 200 205 Asn Val Ser Leu Ser Asn Tyr Phe Lys Gly Pro Asn Tyr Gly Ser Leu 210 215 220 Gly Leu Asp Asn Pro Ile Gly Pro Asp Leu Ser Pro Thr Phe Pro Leu 225 230 235 240 Cys Thr Ile Asn Gln Thr Tyr Cys His Ala Arg Glu Leu Leu Arg Arg 245 250 255 Gln Asn Asp Asp Tyr Val Ile Ile Thr Asn Ile Ser Gly Ala Phe Val 260 265 270 Gly Trp Val Gly Pro Gln Gln Trp Leu Ala Thr Val Gln Pro Asp Val 275 280 285 Leu Leu Glu Ala Leu Gln Arg Glu Val Glu Met Pro Arg Ile Val Gln 290 295 300 His Leu Glu Ala Arg Ile Val Trp Gln Gln Gln Gln Gln Gln Arg Asn 305 310 315 320 Gln His Leu Ile Gln Lys Leu Leu Ser Arg Asn Pro Asn Leu Ile Tyr 325 330 335 Leu Tyr Asp Leu Val Lys Asn Glu Ile Val Tyr Leu Asn Ile Pro Gly 340 345 350 Ser Leu Leu Glu Gly Gly Ser Gly Gly Ala Pro Ile Pro Asn Pro Met 355 360 365 Val Glu Thr Asp Pro Arg Gln Asp Leu Leu Leu Pro Pro Arg Tyr Phe 370 375 380 Gly Leu Glu Glu Leu Ala Ala Leu Gln Ala His Glu Lys Lys Glu Phe 385 390 395 400 Asn Phe Glu Phe Thr Asp Gly Gly Gln Ser Val His Tyr Phe Val Val 405 410 415 Glu Ile Ser Ala Phe Glu Ile Asp Gly Ser Gly Gln Thr Ser Lys Ile 420 425 430 Leu Cys Leu Ala Gln Asp Val Ser His Gly Lys Arg Ala Glu Ala Ala 435 440 445 Leu His Thr Lys Glu Gln Gln Leu Gln Thr Leu Val Asn Thr Ile Ala 450 455 460 Asp Gly Ile Val Ile Leu Asp Asn His Asp Lys Val Ile Tyr Ala Asn 465 470 475 480 Pro Met Ala Cys Gln Met Phe Gly Leu Ser Lys Glu Glu Phe Leu Gln 485 490 495 Ser Gln Leu Gly Leu Ser Asn Arg Gly Gln Thr Glu Ile Gly Ile Asn 500 505 510 Val Ser Pro Glu Glu Glu Gly Ile Gly Glu Ile Lys Ala Val Pro Ile 515 520 525 His Trp Gln Gly Glu Asp Cys Arg Leu Val Thr Val Arg Asp Val Thr 530 535 540 Asp Arg Gln Arg Val Leu Lys Lys Leu Arg Asp Ser Glu Gln Ile His 545 550 555 560 Arg Ser Leu Leu Glu Ala Leu Pro Asn Leu Val Trp Arg Leu Ser Ser 565 570 575 Ala Gly Asp Val Trp Glu Cys Asn Gln Arg Thr Leu Ala Tyr Phe Gly 580 585 590 Arg Arg Gly Arg Lys Ile Leu Gly Asn Thr Trp Gln Gln Phe Ile Glu 595 600 605 Pro Gly Glu Arg Glu Asn Val Gln Arg Gln Trp Arg Gln Gly Ile Ala 610 615 620 Ala Gln Glu Phe Phe Gln Leu Glu Cys Arg Leu Trp Arg Ser Asp Gly 625 630 635 640 Gln Tyr Arg Trp His Leu Leu Gln Val Leu Pro Leu Glu Asp Arg Phe 645 650 655 Gly Ser Ile Asn Gly Trp Leu Ala Ser Ser Thr Asp Ile Asp Asp Leu 660 665 670 Lys Glu Ala Glu Lys Ala Leu Arg Asn Gln Ala Gln Gln Glu Lys Leu 675 680 685 Leu Ser Ser Ile Ser Gln Arg Ile Arg Glu Ser Leu Lys Leu Glu Thr 690 695 700 Ile Leu Arg Thr Thr Val Thr Glu Val Arg Arg Thr Ile His Ala Asp 705 710 715 720 Arg Val Leu Ile His His Ile Gln Glu Asp Gly Leu Gly Thr Thr Ile 725 730 735 Ala Glu Ser Val Val Asn Gly Gln Pro Ser Val Met Gln Met Asp Leu 740 745 750 Ser Pro Glu Ser Phe Pro Pro Glu Cys Tyr Gln Arg Tyr Leu Asn Gly 755 760 765 Tyr Ile Tyr Ala Ser Arg Asp Gln Leu Pro Asp Cys Ala Ile Asn Cys 770 775 780 Ala Val Gln Cys Phe Thr Val Ala Glu Ser Gln Ser Arg Ile Val Ala 785 790 795 800 Pro Ile Val Phe Asp His Ser Leu Trp Gly Leu Leu Ile Val His Gln 805 810 815 Cys Ser Ser Ser Arg Thr Trp Gln Thr Ala Glu Ile Gln Leu Met Gln 820 825 830 Ser Leu Gly Asn Gln Leu Ala Ile Ala Ile Gln Gln Ser Leu Leu Tyr 835 840 845 Glu Arg Leu Gln Glu Glu Leu Ser Glu Arg Gln Arg Ala Glu Gln Lys 850 855 860 Leu Leu Glu Val Asn Gln Leu Gln Lys Gly Ile Phe Asp Val Ala Asn 865 870 875 880 Tyr Met Ile Ile Ser Thr Asp Arg Arg Gly Ile Ile Ser Thr Phe Asn 885 890 895 Arg Thr Ala Glu Glu Ile Leu Gly Tyr Thr Ala Ala Glu Leu Ile Gly 900 905 910 Gln Gln Thr Pro Leu Ile Phe His Asp Gln Glu Glu Met Ala Ser Glu 915 920 925 Ala Val Gln Leu Ser Gln Gln Leu Gln Gln Thr Ile Arg Pro Asn Ser 930 935 940 Ile Asp Met Phe Ala Ile Pro Ala Ile Gln Trp Gly Val Tyr Glu Arg 945 950 955 960 Glu Trp Thr Tyr Ile Thr Lys Thr Gly Asp Arg Leu Pro Val Tyr Val 965 970 975 Ser Ile Thr Ala Leu Arg Asp Asp Gln Gly Lys Val Asp Gly Leu Val 980 985 990 Gly Val Ile Thr Asp Leu Arg Arg Gln Lys Gln Ile Glu Arg Glu Arg 995 1000 1005 Gln Asn Leu Asp Phe Val Val Lys Asn Ser Thr Glu Leu Ile Val Ile 1010 1015 1020 Thr Asp Leu Glu Gln Lys Val Thr Phe Leu Asn Gln Ala Gly Gln Ser 1025 1030 1035 1040 Leu Ile Gly Leu Glu Asn Pro Glu Thr Ala Gln Thr Thr Tyr Leu Ser 1045 1050 1055 Glu His Ile Ser Pro Glu Tyr Leu Asn Phe Trp Gln Met Glu Ile Ile 1060 1065 1070 Pro Gln Val Phe Arg Ser Gly Ala Trp Glu Gly Glu Phe Ser Leu Gln 1075 1080 1085 His Tyr Gln Thr Ala Val Glu Ile Pro Val Thr Ala Ser Val Phe Leu 1090 1095 1100 Leu Gln Gly Val Asn Gly Gln His Pro Ala Asn Leu Val Ala Ile Val 1105 1110 1115 1120 His Asp Ile Thr His Ile Lys Asn Ala Glu Lys Arg Ile Leu Ala Ala 1125 1130 1135 Leu Glu Ala Glu Lys Glu Leu Gly Glu Leu Arg Ser Arg Phe Ile Ser 1140 1145 1150 Thr Thr Ser His Glu Phe Arg Thr Pro Leu Ala Ile Ile Ser Ser Ser 1155 1160 1165 Thr Gly Ile Leu Lys Lys Tyr Trp Pro Lys Leu Asp Gly Gln Arg Arg 1170 1175 1180 Gly Gln His Leu Glu Arg Ile Glu Glu Ser Val His His Met Val Glu 1185 1190 1195 1200 Leu Leu Asp Asp Val Leu Thr Ile Asn Arg Ala Glu Thr Lys Tyr Leu 1205 1210 1215 Pro Phe Glu Pro Gln Pro Leu Asp Leu Val Ser Phe Cys Arg Gly Ile 1220 1225 1230 Thr Asp Glu Leu Gln Ser Ser Thr Glu Tyr His Gly Leu Leu Phe Ser 1235 1240 1245 Tyr Asp Gly Leu Gly Pro Gly Glu Ile Val Ala Phe Asp Pro Lys Leu 1250 1255 1260 Leu Arg Gln Ile Leu Thr Asn Leu Leu Gly Asn Ala Ile Lys Tyr Ser 1265 1270 1275 1280 Pro Ser Gly Gln Pro Val Glu Phe His Leu Gln Arg Arg Gly Asp Val 1285 1290 1295 Gly Ile Phe Ser Val Gln Asp His Gly Ile Gly Ile Gly Pro Glu Asp 1300 1305 1310 Ile Pro Asn Leu Phe Asp Ser Phe Tyr Arg Gly Thr Asn Val Gly Ser 1315 1320 1325 Ile Pro Gly Thr Gly Leu Gly Leu Pro Ile Val Lys Lys Cys Ala Glu 1330 1335 1340 Leu His Gly Gly Met Ile Thr Val Thr Ser Gln Leu Gly Gln Gly Ser 1345 1350 1355 1360 Arg Phe Glu Val Glu Leu Pro Leu Trp Tyr Ser 1365 1370 5 891 PRT Unknown Description of Unknown Organismcph5 locus SLL0041 (locus 1001300) an 891 aa protein, methyl-accepting chemotaxis protein I. Homology to tsr in last 250 aa. 5 Met Ala Glu Ala Phe Ile Ala Glu Asn Thr Ala Val Glu Asp Val Ser 1 5 10 15 Pro Asn Pro Asn Pro Ala Ile Asp Thr Asp Ala Leu Ala Ala Leu Thr 20 25 30 Gln Ser Ala Val Glu Leu Thr Pro Pro Pro Pro Ile Asn Leu Pro Lys 35 40 45 Val Glu Leu Pro Pro Met Gln Pro Leu Ala Pro Leu Met Ala Ile Ala 50 55 60 Asp Pro Asp Asn Leu Ser Pro Met Ser Thr Ser Ile Gln Ala Pro Thr 65 70 75 80 Gln Ser Gly Gly Leu Ser Leu Arg Asn Lys Ala Val Leu Ile Ala Leu 85 90 95 Leu Ile Gly Leu Ile Pro Ala Gly Val Ile Gly Gly Leu Asn Leu Ser 100 105 110 Ser Val Asp Arg Leu Pro Val Pro Gln Thr Glu Gln Gln Val Lys Asp 115 120 125 Ser Thr Thr Lys Gln Ile Arg Asp Gln Ile Leu Ile Gly Leu Leu Val 130 135 140 Thr Ala Val Gly Ala Ala Phe Val Ala Tyr Trp Met Val Gly Glu Asn 145 150 155 160 Thr Lys Ala Gln Thr Ala Leu Ala Leu Lys Ala Lys His Ser His Arg 165 170 175 Asn Leu Asp Gln Pro Leu Ala Val Ala Gly Asp Glu Leu Ala Ile Ala 180 185 190 Asp Gln Thr Ile Asp Ala Leu Ser Ala Gln Val Glu Lys Leu Arg His 195 200 205 Gln Gln Asp Leu Ser Leu Lys Gln Ala Glu Leu Leu Thr Glu Leu Ser 210 215 220 Arg Ala Asn Leu Ser Asp Ile Asp Glu Ile Gln Gly Val Ile Gln Lys 225 230 235 240 Asn Leu Asp Gln Ala Arg Ala Leu Phe Gly Cys Glu Arg Leu Val Phe 245 250 255 Tyr Tyr His Pro Arg Tyr Gln Pro Glu Ala Met Val Val Gln Ala Leu 260 265 270 Asp Leu Thr Thr Gln Gly Leu Ile Asp Ser Lys Asp Pro His Pro Trp 275 280 285 Gly Gln Glu Asp Met Pro Ser Gln Ile Val Ala Ile Asn Asp Thr Ser 290 295 300 Gly Ala Ser Ile Ser Asn Pro His Arg Gln Trp Leu Glu Gln His Gln 305 310 315 320 Val Lys Ala Ser Leu Thr Val Pro Leu His Arg Asp Asn Tyr Pro Leu 325 330 335 Gly Leu Leu Met Ala His His Cys Gln Arg Pro His Gln Trp Glu Met 340 345 350 Arg Glu Arg Gln Phe Leu Gln Gln Leu Thr Glu Glu Leu Gln Thr Thr 355 360 365 Leu Asp Arg Ala Asn Leu Ile Gln Glu Arg Asn Glu Ser Ala Gln Gln 370 375 380 Ala Gln Ile Leu Lys Glu Leu Thr Leu Lys Ile Ser Ala Ala Ile Asn 385 390 395 400 Ser Glu Gln Val Phe Asp Ile Ala Ala Gln Glu Ile Arg Leu Ala Leu 405 410 415 Lys Ala Asp Arg Val Ile Val Tyr Arg Phe Asp Ala Thr Trp Ala Gly 420 425 430 Thr Val Ile Val Glu Ser Val Ala Glu Gly Tyr Pro Lys Ala Leu Gly 435 440 445 Ala Thr Ile Ala Asp Pro Cys Phe Ala Asp Ser Tyr Val Glu Lys Tyr 450 455 460 Arg Ser Gly Arg Ile Gln Ala Thr Arg Asp Ile Tyr Asn Ala Gly Leu 465 470 475 480 Thr Pro Cys His Ile Gly Gln Leu Lys Pro Phe Glu Val Lys Ala Asn 485 490 495 Leu Val Ala Pro Ile Asn Tyr Lys Gly Asn Leu Leu Gly Leu Leu Ile 500 505 510 Ala His Gln Cys Ser Gly Pro Arg Asp Trp His Gln Asn Glu Ile Asp 515 520 525 Leu Phe Gly Gln Leu Thr Val Gln Val Gly Leu Ala Leu Glu Arg Ser 530 535 540 Asp Leu Leu Ala Gln Gln Lys Ile Ala Glu Val Glu Gln Arg Gln Met 545 550 555 560 Arg Glu Lys Met Gln Lys Arg Ala Leu Glu Leu Leu Met Glu Val Asp 565 570 575 Pro Val Ser Arg Gly Asp Leu Thr Ile Arg Ala His Val Thr Glu Asp 580 585 590 Glu Ile Gly Thr Ile Ala Asp Ser Tyr Asn Ala Thr Ile Glu Ser Leu 595 600 605 Arg Arg Ile Val Thr Gln Val Gln Thr Ala Ala Ser Gln Phe Thr Glu 610 615 620 Thr Thr Asp Thr Asn Glu Val Ala Val Arg Gln Leu Ala Gln Gln Ala 625 630 635 640 Asn Arg Gln Ala Leu Asp Val Ala Glu Ala Leu Glu Arg Leu Gln Ala 645 650 655 Met Asn Lys Ser Ile Gln Ala Val Ala Glu Asn Ala Ala Gln Ala Glu 660 665 670 Ser Ala Val Gln Arg Ala Thr Gln Thr Val Asp Gln Gly Glu Asp Ala 675 680 685 Met Asn Arg Thr Val Asp Gly Ile Val Ala Ile Arg Glu Thr Val Ala 690 695 700 Ala Thr Ala Lys Gln Val Lys Arg Leu Gly Glu Ser Ser Gln Lys Ile 705 710 715 720 Ser Lys Val Val Asn Leu Ile Gly Ser Phe Ala Asp Gln Thr Asn Leu 725 730 735 Leu Ala Leu Asn Ala Ala Ile Glu Ala Ala His Ala Gly Glu Glu Gly 740 745 750 Arg Gly Phe Ala Val Val Ala Asp Glu Val Arg Ser Leu Ala Arg Gln 755 760 765 Ser Ala Glu Ala Thr Ala Glu Ile Ala Gln Leu Val Ala Thr Ile Gln 770 775 780 Ala Glu Thr Asn Glu Val Val Asn Ala Met Glu Ala Gly Thr Glu Gln 785 790 795 800 Val Val Val Gly Thr Lys Leu Val Glu Glu Thr Arg Arg Ser Leu Asn 805 810 815 Gln Ile Thr Ala Val Ser Ala Gln Ile Ser Gly Leu Val Glu Ala Ile 820 825 830 Thr Ser Ala Ala Ile Glu Gln Ser Gln Thr Ser Glu Ser Val Thr Gln 835 840 845 Thr Met Ala Leu Val Ala Gln Ile Ala Asp Lys Asn Ser Ser Glu Ala 850 855 860 Ser Gly Val Ser Ala Thr Phe Lys Glu Leu Leu Ala Val Ala Gln Ser 865 870 875 880 Leu Gln Glu Ala Val Lys Gln Phe Lys Val Gln 885 890 6 844 PRT Unknown Description of Unknown Organismcph6 locus SLR12112 (ETR1 homolog; PAS domain) an 844 aa protein. Chromophore domain 461-628. 6 Met Ala Ile Thr Ala Phe Thr Leu Gly Asp Phe Phe Gln Ala Asn Ser 1 5 10 15 Tyr Ile Pro His Gly His Cys Tyr Leu Trp Gln Thr Pro Leu Val Trp 20 25 30 Leu His Val Ser Ala Asp Phe Phe Thr Ala Ile Ala Tyr Tyr Ser Ile 35 40 45 Pro Leu Thr Leu Leu Tyr Phe Leu Arg Lys Arg Gln Asp Ile Pro Phe 50 55 60 Pro Asn Ile Ile Phe Leu Phe Ser Thr Phe Ile Leu Cys Cys Gly Thr 65 70 75 80 Ser His Phe Phe Asp Ile Ile Thr Leu Trp Tyr Pro Ile Tyr Trp Ile 85 90 95 Ser Gly Thr Val Lys Ala Ser Met Ala Ile Val Ser Ile Ile Thr Val 100 105 110 Phe Glu Leu Ile Gln Ile Val Pro Asn Ala Leu Asn Leu Lys Ser Pro 115 120 125 Thr Glu Leu Ala Thr Leu Asn Leu Ala Leu Asn Gln Glu Ile Lys Glu 130 135 140 Arg Gln Thr Ala Glu Ile Ala Leu Gln Glu Leu Asn Asn Asn Leu Glu 145 150 155 160 Lys Arg Val Glu Asp Arg Thr Thr Gln Leu Ala Lys Ile Asn Gln Gln 165 170 175 Leu Glu Gln Glu Ile Glu Asp Lys Thr Arg Ala Lys Glu Asp Leu Glu 180 185 190 Lys Asn Lys Asp Gln Leu Ala Gln Leu Ala Ala Ile Val Glu Ser Ser 195 200 205 Gln Asp Ala Ile Ile Ser Lys Thr Leu Asp Gly Asn Ile Thr Ser Trp 210 215 220 Asn Glu Ser Ala Glu Arg Leu Phe Gly Tyr Thr Ala Glu Glu Met Ile 225 230 235 240 Gly Ser His Ile Thr Lys Leu Ile Pro Glu Glu Leu Ile Leu Glu Glu 245 250 255 Asp Leu Ile Ala Glu Cys Ile Arg Gln Gly Gln Arg Ile Asn Thr Tyr 260 265 270 Glu Thr Gln Arg Gln Arg Lys Asp Gly Thr Lys Ile Asp Val Ala Leu 275 280 285 Thr Ile Ser Pro Ile Arg Asp Glu His Lys Asn Val Val Gly Ala Ser 290 295 300 Lys Ile Val Arg Asp Ile Thr Ala Arg Leu Asp Val Glu Asn Ala Leu 305 310 315 320 Arg Glu Ser Gln Tyr Phe Ile Glu Lys Leu Ala Asn Tyr Ser Pro Gln 325 330 335 Ile Leu Tyr Ile Leu Asp Pro Ile Ala Trp Lys Asn Ile Tyr Val Asn 340 345 350 Tyr Gln Ser Phe Glu Ile Leu Gly Tyr Thr Pro Glu Glu Phe Lys Asn 355 360 365 Gly Gly Thr Glu Leu Leu Leu Asn Ile Val His Pro Asp Asp Ile Pro 370 375 380 Thr Leu Tyr Glu Asn Lys Asn Phe Trp Gln Lys Ser Gln Glu Gly Gln 385 390 395 400 Val Leu Thr Thr Glu Tyr Arg Met Arg His Lys Asn Gly Ser Trp Arg 405 410 415 Trp Leu Arg Ser Arg Glu Val Val Phe Ala Arg Asp Asp Tyr Gly Gln 420 425 430 Val Thr Lys Val Leu Gly Thr Ala Gln Asp Ile Ser Asp Ser Lys Glu 435 440 445 Gln Glu Gln Arg Leu Tyr Glu Gln Gly Arg Arg Glu Ser Leu Leu Arg 450 455 460 Glu Ile Thr Gln Arg Ile Arg Gln Ser Leu Asp Leu Pro Thr Ile Phe 465 470 475 480 Asn Thr Val Val Gln Glu Ile Arg Gln Phe Leu Glu Ala Asp Arg Val 485 490 495 Val Ile Phe Gln Phe Ser Pro Asp Ser Asp Phe Ser Val Gly Asn Ile 500 505 510 Val Ala Glu Ser Val Leu Ala Pro Phe Lys Pro Ile Ile Asn Ser Ala 515 520 525 Ile Glu Glu Thr Cys Phe Ser Asn Asn Tyr Ala Gln Arg Tyr Gln Gln 530 535 540 Gly Arg Ile Gln Val Ile Glu Asp Ile His Gln Ser His Leu Arg Gln 545 550 555 560 Cys His Ile Asp Phe Leu Ala Arg Leu Gln Val Arg Ala Asn Leu Val 565 570 575 Leu Pro Leu Ile Asn Asp Ala Ile Leu Trp Gly Leu Leu Cys Ile His 580 585 590 Gln Cys Asp Ser Ser Arg Val Trp Glu Gln Thr Glu Ile Asp Leu Leu 595 600 605 Lys Gln Ile Thr Asn Gln Phe Glu Ile Ala Ile Gln Gln Ala Thr Leu 610 615 620 Tyr Glu Gln Ala Gln Gln Glu Leu Ala Ser Lys Asn Gln Leu Phe Val 625 630 635 640 Gln Leu Thr Asn Glu Leu Glu Gln Lys Lys Val Leu Leu Lys Glu Ile 645 650 655 His His Arg Val Lys Asn Asn Leu Gln Ile Met Ser Ser Leu Leu Tyr 660 665 670 Leu Gln Phe Ser Lys Ala Ser Pro Ala Ile Gln Gln Leu Ser Glu Glu 675 680 685 Tyr Gln Asn Arg Ile Gln Ser Met Ala Leu Ile His Glu Gln Leu Tyr 690 695 700 Arg Ser Glu Asp Leu Ala Asn Ile Asp Phe Ser Gln Tyr Leu Lys Asn 705 710 715 720 Leu Thr His Asn Ile Cys Gln Ser Tyr Gly Cys Asn Thr Asp Ser Ile 725 730 735 Lys Ile Lys Leu Leu Val Glu Gln Val Lys Val Pro Leu Glu Gln Ser 740 745 750 Ile Pro Leu Gly Leu Ile Ile Gln Glu Leu Val Ser Asn Ala Leu Lys 755 760 765 His Ala Phe Pro Thr Thr Glu Gly Glu Ile Ser Ile Lys Phe Thr Ser 770 775 780 Met Asn Ser His Tyr Ser Leu Gln Val Trp Asp Asn Gly Val Gly Ile 785 790 795 800 Ser Arg Asp Ile Asp Leu Glu Asn Thr Asp Ser Leu Gly Met Gln Leu 805 810 815 Ile Tyr Ser Leu Thr Glu Gln Leu Gln Gly Glu Leu His Tyr Glu Tyr 820 825 830 Val Gly Gly Ala Gln Phe Gly Leu Glu Phe Ser Leu 835 840 7 950 PRT Unknown Description of Unknown Organismcoh7 (locus SLR 1393) a 950 aa protein. Chromophore domain 402-620. Contains a histidine kinase transmitter domain. 7 Met Ser Pro Ser Ser His Gly Thr Ala Val Gln Gln Ala Ile Ala Asp 1 5 10 15 Gln Leu Leu Glu Met Ile Leu Gln Ser Gln Asp Leu His Asn Ala Tyr 20 25 30 Arg Leu Val Val Glu Gly Leu Gln Arg Gly Leu Gly Val Asp Arg Val 35 40 45 Leu Leu Val Gln Asn Ala Val Phe Pro Asn Arg Gln Ser Arg Leu Val 50 55 60 Ala Gln Ala Ile Ala Pro Ala Arg Asp Ile Met Leu Leu Asp Glu Pro 65 70 75 80 Cys Ala Asp Cys Arg Trp Leu His Leu Leu Gly Gln Leu Pro His Tyr 85 90 95 Gly Leu Trp Thr Val Trp Glu Gly Glu Gly Glu Phe Val Gln Leu Asp 100 105 110 Pro Val Gln Gly Glu Phe Cys Arg Thr Leu Gly Ile Lys Ser Leu Leu 115 120 125 His Leu Pro Leu Val Ile Asn Gln Arg His Trp Gly Val Leu Ser Leu 130 135 140 Gln Tyr Leu His Gln Ala Arg Pro Trp Pro Leu Glu Asp Gln Gln Phe 145 150 155 160 Ala Gln Arg Ile Ala His Leu Phe Cys Leu Gly Leu Met Lys Thr Glu 165 170 175 Leu Trp Ile His Cys Gln Asn His Lys Asn Ala Leu Gln Thr Val Val 180 185 190 Ala Glu Gly Gln Val Gln Arg Glu Thr Tyr Leu Lys Ser Ala Gln Arg 195 200 205 Glu Arg Ala Ile Ala Asp Val Ile Asp Lys Ile Arg Phe Ala Leu Asp 210 215 220 Leu Arg Ser Leu Phe Gln Thr Thr Val Thr Glu Val Arg Lys Leu Leu 225 230 235 240 Val Ala Asp Arg Val Met Ile Ile Lys Val Arg Gln Asn Lys Asn Phe 245 250 255 Ser Trp Gly Glu Ile Gln Ala Glu Ala Gln Thr Asp Asp Lys Leu Cys 260 265 270 Leu Leu Pro Pro Lys Glu Arg Val Pro Leu Ser Ser Arg Trp Ile Asp 275 280 285 His Phe Ala Lys Gly Leu Ile Leu Ala Met Asp Asp Thr Asp Asp Gln 290 295 300 Arg Ala Asp Phe Asp Gln Ser Met Leu Ala Leu Ala Lys Ala Asn Leu 305 310 315 320 Val Val Pro Leu Phe Ser Gly Asp Arg Leu Trp Gly Val Leu Ser Val 325 330 335 His Gln Cys Asp Gly Pro Arg Val Trp Glu Ser Ser Asp Ile Glu Phe 340 345 350 Ala Leu Lys Ile Ala Leu Asn Leu Gly Val Ala Leu Gln Gln Ala Glu 355 360 365 Leu Leu Thr Glu Ser Gln Arg Arg Ser Thr Ala Leu Gln Ser Ala Leu 370 375 380 Gly Glu Val Glu Ala Gln Lys Asp Tyr Leu Ala Arg Ile Ala Glu Glu 385 390 395 400 Glu Arg Ala Leu Thr Arg Val Ile Glu Gly Ile Arg Gln Thr Leu Glu 405 410 415 Leu Gln Asn Ile Phe Arg Ala Thr Ser Asp Glu Val Arg His Leu Leu 420 425 430 Ser Cys Asp Arg Val Leu Val Tyr Arg Phe Asn Pro Asp Trp Ser Gly 435 440 445 Glu Phe Ile His Glu Ser Val Ala Gln Met Trp Glu Pro Leu Lys Asp 450 455 460 Leu Gln Asn Asn Phe Pro Leu Trp Gln Asp Thr Tyr Leu Gln Glu Asn 465 470 475 480 Glu Gly Gly Arg Tyr Arg Asn His Glu Ser Leu Ala Val Gly Asp Val 485 490 495 Glu Thr Ala Gly Phe Thr Asp Cys His Leu Asp Asn Leu Arg Arg Phe 500 505 510 Glu Ile Arg Ala Phe Leu Thr Val Pro Val Phe Val Gly Glu Gln Leu 515 520 525 Trp Gly Leu Leu Gly Ala Tyr Gln Asn Gly Ala Pro Arg His Trp Gln 530 535 540 Ala Arg Glu Ile His Leu Leu His Gln Ile Ala Asn Gln Leu Gly Val 545 550 555 560 Ala Val Tyr Gln Ala Gln Leu Leu Ala Arg Phe Gln Glu Gln Ser Lys 565 570 575 Thr Met Glu Asn Thr Leu Ala Asp Leu Thr Ala Ile Val Asp Asn Leu 580 585 590 Ala Asp Gly Leu Leu Val Ile Asp Leu Phe Gly Arg Ile Thr Arg Tyr 595 600 605 Asn Pro Ala Leu Leu Ala Met Phe Asp Leu Glu Gly Leu Glu Leu Leu 610 615 620 Gly Ala Gly Val Asp Ala Tyr Phe Pro Glu Thr Leu Asn Gln Leu Leu 625 630 635 640 Ala Lys Pro Glu Arg Glu Glu Gln Lys Leu Val Thr Ala Asp Val Glu 645 650 655 Leu Ser Gln Gly Arg Gln Gly Gln Ala Leu Ile Thr Ser Ile Thr Ser 660 665 670 His Glu Asn Gly Cys Glu Tyr Pro Gln Cys Leu Gly Ala Val Ile Met 675 680 685 Ile Arg Asp Val Thr His Glu Arg Glu Val Glu Arg Met Lys Thr Asp 690 695 700 Phe Leu Ala Thr Val Ser His Glu Leu Arg Thr Pro Leu Thr Ser Ile 705 710 715 720 Leu Gly Phe Ala Thr Val Ile Gln Asp Lys Leu Asn Arg Val Ile Ile 725 730 735 Pro Glu Leu Asp Leu Ala Gln Pro His Leu Gly Lys Ala Thr Glu Arg 740 745 750 Val Met Arg Asn Leu Ala Ile Ile Glu Ser Glu Ala Gln Arg Leu Thr 755 760 765 Val Leu Ile Asn Asp Val Leu Asp Ile Ala Lys Met Glu Ala Gly Gln 770 775 780 Glu Ser Trp Gln Glu Gln Pro Cys Ala Ile Gly Pro Ile Ile Glu Arg 785 790 795 800 Ala Ile Ala Thr Ile Thr Pro Gln Ala Gln Lys Lys Asn Ile Ser Leu 805 810 815 Gln Gly Asp Leu Glu Pro Gly Leu Pro Asp Phe Ile Gly Asp Glu Asn 820 825 830 Arg Ile Leu Gln Val Val Leu Asn Leu Leu Ser Asn Ala Val Lys Phe 835 840 845 Thr Pro Lys Gly Leu Ile Thr Ala Arg Ser His Phe His Gln Asn Tyr 850 855 860 Leu Trp Val Glu Ile Ile Asp His Gly Pro Gly Ile His Pro Ala Asp 865 870 875 880 Gln Glu Lys Ile Phe Glu Pro Phe Gln Gln Gly Gly Gly Asp Val Leu 885 890 895 Thr Asp Lys Pro Gln Gly Thr Gly Leu Gly Leu Pro Ile Cys Lys Lys 900 905 910 Ile Val Glu His His Gly Gly Thr Ile Gly Val Asn Ser Ser Leu Gly 915 920 925 Arg Gly Ser Thr Phe Tyr Phe Ser Leu Pro Val Pro Val Pro Ala Val 930 935 940 Glu Thr Ser Pro Ala Val 945 950 8 750 PRT Unknown Description of Unknown Organismcph8 (locus SLR1969) A 750 aa protein. Chromophore domain 156-347. Contains a histidine kinase transmitter domain. 8 Met Leu Pro Ala Phe Ser Pro Ile Phe Arg Arg Leu Leu Pro Ala Val 1 5 10 15 Thr Phe Glu Arg Leu Leu Arg Phe Trp Arg Thr Leu Ala Gln Gln Thr 20 25 30 Gly Asp Gly Val Gln Cys Phe Val Gly Asp Leu Pro Ser Ser Leu Lys 35 40 45 Pro Pro Pro Gly Pro Ser Val Leu Glu Ala Glu Val Asp His Arg Phe 50 55 60 Ala Leu Leu Val Ser Pro Gly Gln Trp Ala Leu Leu Glu Gly Glu Gln 65 70 75 80 Ile Ser Pro His His Tyr Ala Val Ser Ile Thr Phe Ala Gln Gly Ile 85 90 95 Ile Glu Asp Phe Ile Gln Lys Gln Asn Leu Pro Val Val Ala Glu Ala 100 105 110 Met Pro His Arg Pro Glu Thr Pro Ser Gly Pro Thr Ile Ala Glu Gln 115 120 125 Leu Thr Leu Gly Leu Leu Glu Ile Leu Asn Ser Asp Ser Thr Ser Phe 130 135 140 Ser Pro Glu Pro Ser Leu Gln Asp Ser Leu Gln Ala Ser Gln Val Lys 145 150 155 160 Leu Leu Ser Gln Val Ile Ala Gln Ile Arg Gln Ser Leu Asp Leu Ser 165 170 175 Glu Ile Leu Asn Asn Ala Val Thr Ala Val Gln Lys Phe Leu Phe Val 180 185 190 Asp Arg Leu Val Ile Tyr Gln Phe His Tyr Ser Gln Pro Ser Leu Thr 195 200 205 Pro Leu Glu Glu Asn Gln Ile Pro Ala Pro Arg Pro Arg Gln Gln Tyr 210 215 220 Gly Glu Val Thr Tyr Glu Ala Arg Arg Ser Pro Glu Ile Asp Thr Met 225 230 235 240 Leu Gly Ile Met Thr Glu Asn Asp Cys Phe Ser Gln Val Phe Ser Tyr 245 250 255 Glu Gln Lys Tyr Leu Lys Gly Ala Val Val Ala Val Ser Asp Ile Glu 260 265 270 Asn His Tyr Ser Ser Ser Tyr Cys Leu Val Gly Leu Leu Gln Arg Tyr 275 280 285 Gln Val Arg Ala Lys Leu Val Ala Pro Ile Ile Val Glu Gly Gln Leu 290 295 300 Trp Gly Leu Leu Ile Ala His Gln Cys His His Pro Arg Gln Trp Leu 305 310 315 320 Asp Ser Glu Lys Asn Phe Leu Gly Gln Ile Gly Glu His Leu Ala Val 325 330 335 Ala Ile Val Gln Ser Leu Leu Tyr Ser Glu Val Gln Lys Gln Lys Asn 340 345 350 Asn Phe Glu Lys Arg Val Ile Glu Arg Thr Lys Glu Leu Arg Asp Thr 355 360 365 Leu Met Ala Ala Gln Ala Ala Asn Leu Leu Lys Ser Gln Phe Ile Asn 370 375 380 Asn Ile Ser His Glu Leu Arg Thr Pro Leu Thr Ser Ile Ile Gly Leu 385 390 395 400 Ser Ala Thr Leu Leu Arg Trp Phe Asp His Pro Ala Ser Leu Pro Pro 405 410 415 Ala Lys Gln Gln Tyr Tyr Leu Leu Asn Ile Gln Glu Asn Gly Lys Lys 420 425 430 Leu Leu Asp Gln Ile Asn Ser Ile Ile Gln Leu Ser Gln Leu Glu Ser 435 440 445 Gly Gln Thr Ala Leu Asn Cys Gln Ser Phe Ser Leu His Thr Leu Ala 450 455 460 Gln Thr Val Ile His Ser Leu Leu Gly Val Ala Ile Lys Gln Gln Ile 465 470 475 480 Asn Leu Glu Leu Asp Tyr Gln Ile Asn Val Gly Gln Asp Gln Phe Cys 485 490 495 Ala Asp Gln Glu Arg Leu Asp Gln Ile Leu Thr Gln Leu Leu Asn Asn 500 505 510 Ala Leu Lys Phe Thr Pro Ala Glu Gly Thr Val Ile Leu Arg Ile Trp 515 520 525 Lys Glu Ser Asn Gln Ala Ile Phe Gln Val Glu Asp Thr Gly Ile Gly 530 535 540 Ile Asn Glu Gln Gln Leu Pro Val Leu Phe Glu Ala Phe Lys Val Ala 545 550 555 560 Gly Asp Ser Tyr Thr Ser Phe Tyr Glu Thr Gly Gly Val Gly Leu Ala 565 570 575 Leu Thr Lys Gln Leu Val Glu Leu His Gly Gly Tyr Ile Glu Val Glu 580 585 590 Ser Ser Pro Gly Gln Gly Thr Ile Phe Thr Thr Val Ile Pro Gln Gln 595 600 605 Asn Phe Pro Pro Thr Thr Lys Gly Gln Val Gln Asp Lys Leu Asp Ala 610 615 620 Ala Met Pro Phe Asn Ser Ser Val Ile Val Ile Glu Gln Asp Glu Glu 625 630 635 640 Ile Ala Thr Leu Ile Cys Glu Leu Leu Thr Val Ala Asn Tyr Gln Val 645 650 655 Ile Trp Leu Ile Asp Thr Thr Asn Ala Leu Gln Gln Val Glu Leu Leu 660 665 670 Gln Pro Gly Leu Ile Ile Val Asp Gly Asp Phe Val Asp Val Thr Glu 675 680 685 Val Thr Arg Gly Ile Lys Lys Ser Arg Arg Ile Ser Lys Val Thr Val 690 695 700 Phe Leu Leu Ser Glu Ser Leu Ser Ser Ala Glu Trp Gln Ala Leu Ser 705 710 715 720 Gln Lys Gly Ile Asp Asp Tyr Leu Leu Lys Pro Leu Gln Pro Glu Leu 725 730 735 Leu Leu Gln Arg Val Gln Ser Ile Gln Gln Glu Pro Leu Arg 740 745 750 9 196 PRT Unknown Description of Unknown OrganismAtphye 9 Lys Leu Ala Val Arg Ala Ile Ser Arg Leu Gln Ser Leu Pro Gly Gly 1 5 10 15 Asp Ile Gly Ala Leu Cys Asp Thr Val Val Glu Asp Val Gln Arg Leu 20 25 30 Thr Gly Tyr Asp Arg Val Met Val Tyr Gln Phe His Glu Asp Asp His 35 40 45 Gly Glu Val Val Ser Glu Ile Arg Arg Ser Asp Leu Glu Pro Tyr Leu 50 55 60 Gly Leu His Tyr Pro Ala Thr Asp Ile Pro Gln Ala Ala Arg Phe Leu 65 70 75 80 Phe Lys Gln Asn Arg Val Arg Met Ile Cys Asp Cys Asn Ala Thr Pro 85 90 95 Val Lys Val Val Gln Ser Glu Glu Leu Lys Arg Pro Leu Cys Leu Val 100 105 110 Asn Ser Thr Leu Arg Ala Pro His Gly Cys His Thr Gln Tyr Met Ala 115 120 125 Asn Met Gly Ser Val Ala Ser Leu Ala Leu Ala Ile Val Val Lys Gly 130 135 140 Lys Asp Ser Ser Lys Leu Trp Gly Leu Val Val Gly His His Cys Ser 145 150 155 160 Pro Arg Tyr Val Pro Phe Pro Leu Arg Tyr Ala Cys Glu Phe Leu Met 165 170 175 Gln Ala Phe Gly Leu Gln Leu Gln Met Glu Leu Gln Leu Ala Ser Gln 180 185 190 Leu Ala Glu Lys 195 10 207 PRT Unknown Description of Unknown OrganismAtphyb 10 Lys Leu Ala Val Arg Ala Ile Ser Gln Leu Gln Ala Leu Pro Gly Gly 1 5 10 15 Asp Ile Lys Leu Leu Cys Asp Thr Val Val Glu Ser Val Arg Asp Leu 20 25 30 Thr Gly Tyr Asp Arg Val Met Val Tyr Lys Phe His Glu Asp Glu His 35 40 45 Gly Glu Val Val Ala Glu Ser Lys Arg Asp Asp Leu Glu Pro Tyr Ile 50 55 60 Gly Leu His Tyr Pro Ala Thr Asp Ile Pro Gln Ala Ser Arg Phe Leu 65 70 75 80 Phe Lys Gln Asn Arg Val Arg Met Ile Val Asp Cys Asn Ala Thr Pro 85 90 95 Val Leu Val Val Gln Asp Asp Arg Leu Thr Gln Ser Met Cys Leu Val 100 105 110 Gly Ser Thr Leu Arg Ala Pro His Gly Cys His Ser Gln Tyr Met Ala 115 120 125 Asn Met Gly Ser Ile Ala Ser Leu Ala Met Ala Val Ile Ile Asn Gly 130 135 140 Asn Glu Asp Asp Gly Ser Asn Val Ala Ser Gly Arg Ser Ser Met Arg 145 150 155 160 Leu Trp Gly Leu Val Val Cys His His Thr Ser Ser Arg Cys Ile Pro 165 170 175 Phe Pro Leu Arg Tyr Ala Cys Glu Phe Leu Met Gln Ala Phe Gly Leu 180 185 190 Gln Leu Asn Met Glu Leu Gln Leu Ala Leu Gln Met Ser Glu Lys 195 200 205 11 210 PRT Unknown Description of Unknown OrganismMcphy1b 11 Lys Leu Ala Ala Lys Ala Ile Ser Arg Leu Gln Ser Leu Pro Gly Gly 1 5 10 15 Asp Ile Gly Leu Leu Cys Asp Ala Val Val Glu Glu Val Arg Glu Leu 20 25 30 Thr Gly Tyr Asp Arg Val Met Ala Tyr Lys Phe His Glu Asp Glu His 35 40 45 Gly Glu Val Ile Ala Glu Ile Arg Arg Ser Asp Leu Glu Pro Tyr Leu 50 55 60 Gly Leu His Tyr Pro Ala Thr Asp Ile Pro Gln Ala Ala Arg Phe Leu 65 70 75 80 Phe Met Lys Asn Arg Val Arg Ile Ile Cys Asp Cys Ser Ala Pro Pro 85 90 95 Val Lys Val Ile Gln Asp Pro Thr Met Lys His Pro Ile Ser Leu Ala 100 105 110 Gly Ser Thr Leu Arg Gly Val His Gly Cys His Ala Gln Tyr Met Ala 115 120 125 Asn Met Gly Ser Val Ala Ser Leu Val Met Ala Val Ile Ile Asn Asp 130 135 140 Asn Ser Ser Glu Glu Gly Ala Thr Ala Ala Gly Gly Ile Leu His Lys 145 150 155 160 Gly Arg Lys Leu Trp Gly Leu Val Val Cys His His Ser Ser Pro Arg 165 170 175 Tyr Val Pro Phe Pro Leu Arg Ser Ala Cys Glu Phe Leu Met Gln Val 180 185 190 Phe Gly Leu Gln Leu Asn Met Glu Val Glu Leu Ser Ser Gln Leu Arg 195 200 205 Glu Lys 210 12 206 PRT Unknown Description of Unknown OrganismAtphyc 12 Lys Leu Ala Ala Lys Ser Ile Ser Arg Leu Gln Ala Leu Pro Ser Gly 1 5 10 15 Asn Met Leu Leu Leu Cys Asp Ala Leu Val Lys Glu Val Ser Glu Leu 20 25 30 Thr Gly Tyr Asp Arg Val Met Val Tyr Lys Phe His Glu Asp Gly His 35 40 45 Gly Glu Val Ile Ala Glu Cys Cys Arg Glu Asp Met Glu Pro Tyr Leu 50 55 60 Gly Leu His Tyr Ser Ala Thr Asp Ile Pro Gln Ala Ser Arg Phe Leu 65 70 75 80 Phe Met Arg Asn Lys Val Arg Met Ile Cys Asp Cys Ser Ala Val Pro 85 90 95 Val Lys Val Val Gln Asp Lys Ser Leu Ser Gln Pro Ile Ser Leu Ser 100 105 110 Gly Ser Thr Leu Arg Ala Pro His Gly Cys His Ala Gln Tyr Met Ser 115 120 125 Asn Met Gly Ser Val Ala Ser Leu Val Met Ser Val Thr Ile Asn Gly 130 135 140 Ser Asp Ser Asp Glu Met Asn Arg Asp Leu Gln Thr Gly Arg His Leu 145 150 155 160 Trp Gly Leu Val Val Cys His His Ala Ser Pro Arg Phe Val Pro Phe 165 170 175 Pro Leu Arg Tyr Ala Cys Glu Phe Leu Thr Gln Val Phe Gly Val Gln 180 185 190 Ile Asn Lys Glu Ala Glu Ser Ala Val Leu Leu Lys Glu Lys 195 200 205 13 210 PRT Unknown Description of Unknown OrganismAtphya 13 Lys Leu Ala Ala Lys Ala Ile Thr Arg Leu Gln Ser Leu Pro Ser Gly 1 5 10 15 Ser Met Glu Arg Leu Cys Asp Thr Met Val Gln Glu Val Phe Glu Leu 20 25 30 Thr Gly Tyr Asp Arg Val Met Ala Tyr Lys Phe His Glu Asp Asp His 35 40 45 Gly Glu Val Val Ser Glu Val Thr Lys Pro Gly Leu Glu Pro Tyr Leu 50 55 60 Gly Leu His Tyr Pro Ala Thr Asp Ile Pro Gln Ala Ala Arg Phe Leu 65 70 75 80 Phe Met Lys Asn Lys Val Arg Met Ile Val Asp Cys Asn Ala Lys His 85 90 95 Ala Arg Val Leu Gln Asp Glu Lys Leu Ser Phe Asp Leu Thr Leu Cys 100 105 110 Gly Ser Thr Leu Arg Ala Pro His Ser Cys His Leu Gln Tyr Met Ala 115 120 125 Asn Met Asp Ser Ile Ala Ser Leu Val Met Ala Val Val Val Asn Glu 130 135 140 Glu Asp Gly Glu Gly Asp Ala Pro Asp Ala Thr Thr Gln Pro Gln Lys 145 150 155 160 Arg Lys Arg Leu Trp Gly Leu Val Val Cys His Asn Thr Thr Pro Arg 165 170 175 Phe Val Pro Phe Pro Leu Arg Tyr Ala Cys Glu Phe Leu Ala Gln Val 180 185 190 Phe Ala Ile His Val Asn Lys Glu Val Glu Leu Asp Asn Gln Met Val 195 200 205 Glu Lys 210 14 192 PRT Unknown Description of Unknown Organismslr0473 14 His Met Ala Asn Ala Ala Leu Asn Arg Leu Arg Gln Gln Ala Asn Leu 1 5 10 15 Arg Asp Phe Tyr Asp Val Ile Val Glu Glu Val Arg Arg Met Thr Gly 20 25 30 Phe Asp Arg Val Met Leu Tyr Arg Phe Asp Glu Asn Asn His Gly Asp 35 40 45 Val Ile Ala Glu Asp Lys Arg Asp Asp Met Glu Pro Tyr Leu Gly Leu 50 55 60 His Tyr Pro Glu Ser Asp Ile Pro Gln Pro Ala Arg Arg Leu Phe Ile 65 70 75 80 His Asn Pro Ile Arg Val Ile Pro Asp Val Tyr Gly Val Ala Val Pro 85 90 95 Leu Thr Pro Ala Val Asn Pro Ser Thr Asn Arg Ala Val Asp Leu Thr 100 105 110 Glu Ser Ile Leu Arg Ser Ala Tyr His Cys His Leu Thr Tyr Leu Lys 115 120 125 Asn Met Gly Val Gly Ala Ser Leu Thr Ile Ser Leu Ile Lys Asp Gly 130 135 140 His Leu Trp Gly Leu Ile Ala Cys His His Gln Thr Pro Lys Val Ile 145 150 155 160 Pro Phe Glu Leu Arg Lys Ala Cys Glu Phe Phe Gly Arg Val Val Phe 165 170 175 Ser Asn Ile Ser Ala Gln Glu Asp Thr Glu Thr Phe Asp Tyr Arg Val 180 185 190 15 177 PRT Unknown Description of Unknown Organismsl111473 15 Arg Phe Ile Asn Gln Ile Thr Gln His Ile Arg Gln Ser Leu Asn Leu 1 5 10 15 Glu Thr Val Leu Asn Thr Thr Val Ala Glu Val Lys Thr Leu Leu Gln 20 25 30 Val Asp Arg Val Leu Ile Tyr Arg Ile Trp Gln Asp Gly Thr Gly Ser 35 40 45 Ala Ile Thr Glu Ser Val Asn Ala Asn Tyr Pro Ser Ile Leu Gly Arg 50 55 60 Thr Phe Ser Asp Glu Val Phe Pro Val Glu Tyr His Gln Ala Tyr Thr 65 70 75 80 Lys Gly Lys Val Arg Ala Ile Asn Asp Ile Asp Gln Asp Asp Ile Glu 85 90 95 Ile Cys Leu Ala Asp Phe Val Lys Gln Phe Gly Val Lys Ser Lys Leu 100 105 110 Val Val Pro Ile Leu Gln His Asn Arg Ala Ser Ser Leu Asp Asn Glu 115 120 125 Ser Glu Phe Pro Tyr Leu Trp Gly Leu Leu Ile Thr His Gln Cys Ala 130 135 140 Phe Thr Arg Pro Trp Gln Pro Trp Glu Val Glu Leu Met Lys Gln Leu 145 150 155 160 Ala Asn Gln Val Ala Ile Ala Ile Gln Gln Ser Glu Leu Tyr Glu Gln 165 170 175 Leu 16 173 PRT Unknown Description of Unknown OrganismRcae 16 Glu Leu Phe Ser Glu Val Thr Leu Lys Ile Arg Gln Ser Leu Gln Leu 1 5 10 15 Lys Glu Ile Leu His Thr Thr Val Thr Glu Val Gln Arg Ile Leu Gln 20 25 30 Ala Asp Arg Val Leu Ile Tyr His Val Leu Pro Asp Gly Thr Gly Lys 35 40 45 Thr Ile Ser Glu Ser Val Leu Pro Asp Tyr Pro Thr Leu Met Asp Leu 50 55 60 Glu Phe Pro Gln Glu Val Phe Pro Gln Glu Tyr Gln Gln Leu Tyr Ala 65 70 75 80 Gln Gly Arg Val Arg Ala Ile Ala Asp Val His Asp Pro Thr Ala Gly 85 90 95 Leu Ala Glu Cys Leu Val Glu Phe Val Asp Gln Phe His Ile Lys Ala 100 105 110 Lys Leu Ile Val Pro Ile Val Gln Asn Leu Asn Ala Asn Ser Gln Asn 115 120 125 Gln Leu Trp Gly Leu Leu Ile Ala His Gln Cys Asp Ser Val Arg Gln 130 135 140 Trp Val Asp Phe Glu Leu Glu Leu Met Gln Gln Leu Ala Asp Gln Ile 145 150 155 160 Ser Ile Ala Leu Ser Gln Ala Gln Leu Leu Gly Arg Leu 165 170 17 168 PRT Unknown Description of Unknown Organismslr1212 17 Ser Leu Leu Arg Glu Ile Thr Gln Arg Ile Arg Gln Ser Leu Asp Leu 1 5 10 15 Pro Thr Ile Phe Asn Thr Val Val Gln Glu Ile Arg Gln Phe Leu Glu 20 25 30 Ala Asp Arg Val Val Ile Phe Gln Phe Ser Pro Asp Ser Asp Phe Ser 35 40 45 Val Gly Asn Ile Val Ala Glu Ser Val Leu Ala Pro Phe Lys Pro Ile 50 55 60 Ile Asn Ser Ala Ile Glu Glu Thr Cys Phe Ser Asn Asn Tyr Ala Gln 65 70 75 80 Arg Tyr Gln Gln Gly Arg Ile Gln Val Ile Glu Asp Ile His Gln Ser 85 90 95 His Leu Arg Gln Cys His Ile Asp Phe Leu Ala Arg Leu Gln Val Arg 100 105 110 Ala Asn Leu Val Leu Pro Leu Ile Asn Asp Ala Ile Leu Trp Gly Leu 115 120 125 Leu Cys Ile His Gln Cys Asp Ser Ser Arg Val Trp Glu Gln Thr Glu 130 135 140 Ile Asp Leu Leu Lys Gln Ile Thr Asn Gln Phe Glu Ile Ala Ile Gln 145 150 155 160 Gln Ala Thr Leu Tyr Glu Gln Ala 165 18 165 PRT Unknown Description of Unknown Organismsl110821b 18 Lys Leu Val Leu Lys Ile Ala Asn Lys Ile Arg Ala Ser Leu Asn Ile 1 5 10 15 Asn Asp Ile Leu Tyr Ser Thr Val Thr Glu Val Arg Gln Phe Leu Asn 20 25 30 Thr Asp Arg Val Val Leu Phe Lys Phe Asn Ser Gln Trp Ser Gly Gln 35 40 45 Val Val Thr Glu Ser His Asn Asp Phe Cys Arg Ser Ile Ile Asn Asp 50 55 60 Glu Ile Asp Asp Pro Cys Phe Lys Gly His Tyr Leu Arg Leu Tyr Arg 65 70 75 80 Glu Gly Arg Val Arg Ala Val Ser Asp Ile Glu Lys Ala Asp Leu Ala 85 90 95 Asp Cys His Lys Glu Leu Leu Arg His Tyr Gln Val Lys Ala Asn Leu 100 105 110 Val Val Pro Val Val Phe Asn Glu Asn Leu Trp Gly Leu Leu Ile Ala 115 120 125 His Glu Cys Lys Thr Pro Arg Tyr Trp Gln Glu Glu Asp Leu Gln Leu 130 135 140 Leu Met Glu Leu Ala Thr Gln Val Ala Ile Ala Ile His Gln Gly Glu 145 150 155 160 Leu Tyr Glu Gln Leu 165 19 165 PRT Unknown Description of Unknown Organismsl111124 19 Lys Leu Leu Ser Ser Ile Ser Gln Arg Ile Arg Glu Ser Leu Lys Leu 1 5 10 15 Glu Thr Ile Leu Arg Thr Thr Val Thr Glu Val Arg Arg Thr Ile His 20 25 30 Ala Asp Arg Val Leu Ile His His Ile Gln Glu Asp Gly Leu Gly Thr 35 40 45 Thr Ile Ala Glu Ser Val Val Asn Gly Gln Pro Ser Val Met Gln Met 50 55 60 Asp Leu Ser Pro Glu Ser Phe Pro Pro Glu Cys Tyr Gln Arg Tyr Leu 65 70 75 80 Asn Gly Tyr Ile Tyr Ala Ser Arg Asp Gln Leu Pro Asp Cys Ala Ile 85 90 95 Asn Cys Ala Val Gln Cys Phe Thr Val Ala Glu Ser Gln Ser Arg Ile 100 105 110 Val Ala Pro Ile Val Phe Asp His Ser Leu Trp Gly Leu Leu Ile Val 115 120 125 His Gln Cys Ser Ser Ser Arg Thr Trp Gln Thr Ala Glu Ile Gln Leu 130 135 140 Met Gln Ser Leu Gly Asn Gln Leu Ala Ile Ala Ile Gln Gln Ser Leu 145 150 155 160 Leu Tyr Glu Arg Leu 165 20 165 PRT Unknown Description of Unknown Organismsl110041 20 Gln Ile Leu Lys Glu Leu Thr Leu Lys Ile Ser Ala Ala Ile Asn Ser 1 5 10 15 Glu Gln Val Phe Asp Ile Ala Ala Gln Glu Ile Arg Leu Ala Leu Lys 20 25 30 Ala Asp Arg Val Ile Val Tyr Arg Phe Asp Ala Thr Trp Ala Gly Thr 35 40 45 Val Ile Val Glu Ser Val Ala Glu Gly Tyr Pro Lys Ala Leu Gly Ala 50 55 60 Thr Ile Ala Asp Pro Cys Phe Ala Asp Ser Tyr Val Glu Lys Tyr Arg 65 70 75 80 Ser Gly Arg Ile Gln Ala Thr Arg Asp Ile Tyr Asn Ala Gly Leu Thr 85 90 95 Pro Cys His Ile Gly Gln Leu Lys Pro Phe Glu Val Lys Ala Asn Leu 100 105 110 Val Ala Pro Ile Asn Tyr Lys Gly Asn Leu Leu Gly Leu Leu Ile Ala 115 120 125 His Gln Cys Ser Gly Pro Arg Asp Trp His Gln Asn Glu Ile Asp Leu 130 135 140 Phe Gly Gln Leu Thr Val Gln Val Gly Leu Ala Leu Glu Arg Ser Asp 145 150 155 160 Leu Leu Ala Gln Gln 165 21 170 PRT Unknown Description of Unknown Organismslr1393 21 Arg Ala Leu Thr Arg Val Ile Glu Gly Ile Arg Gln Thr Leu Glu Leu 1 5 10 15 Gln Asn Ile Phe Arg Ala Thr Ser Asp Glu Val Arg His Leu Leu Ser 20 25 30 Cys Asp Arg Val Leu Val Tyr Arg Phe Asn Pro Asp Trp Ser Gly Glu 35 40 45 Phe Ile His Glu Ser Val Ala Gln Met Trp Glu Pro Leu Lys Asp Leu 50 55 60 Gln Asn Asn Phe Pro Leu Trp Gln Asp Thr Tyr Leu Gln Glu Asn Glu 65 70 75 80 Gly Gly Arg Tyr Arg Asn His Glu Ser Leu Ala Val Gly Asp Val Glu 85 90 95 Thr Ala Gly Phe Thr Asp Cys His Leu Asp Asn Leu Arg Arg Phe Glu 100 105 110 Ile Arg Ala Phe Leu Thr Val Pro Val Phe Val Gly Glu Gln Leu Trp 115 120 125 Gly Leu Leu Gly Ala Tyr Gln Asn Gly Ala Pro Arg His Trp Gln Ala 130 135 140 Arg Glu Ile His Leu Leu His Gln Ile Ala Asn Gln Leu Gly Val Ala 145 150 155 160 Val Tyr Gln Ala Gln Leu Leu Ala Arg Phe 165 170 22 188 PRT Unknown Description of Unknown Organismsl111969 22 Lys Leu Leu Ser Gln Val Ile Ala Gln Ile Arg Gln Ser Leu Asp Leu 1 5 10 15 Ser Glu Ile Leu Asn Asn Ala Val Thr Ala Val Gln Lys Phe Leu Phe 20 25 30 Val Asp Arg Leu Val Ile Tyr Gln Phe His Tyr Ser Gln Pro Ser Leu 35 40 45 Thr Pro Leu Glu Glu Asn Gln Ile Pro Ala Pro Arg Pro Arg Gln Gln 50 55 60 Tyr Gly Glu Val Thr Tyr Glu Ala Arg Arg Ser Pro Glu Ile Asp Thr 65 70 75 80 Met Leu Gly Ile Met Thr Glu Asn Asp Cys Phe Ser Gln Val Phe Ser 85 90 95 Tyr Glu Gln Lys Tyr Leu Lys Gly Ala Val Val Ala Val Ser Asp Ile 100 105 110 Glu Asn His Tyr Ser Ser Ser Tyr Cys Leu Val Gly Leu Leu Gln Arg 115 120 125 Tyr Gln Val Arg Ala Lys Leu Val Ala Pro Ile Ile Val Glu Gly Gln 130 135 140 Leu Trp Gly Leu Leu Ile Ala His Gln Cys His His Pro Arg Gln Trp 145 150 155 160 Leu Asp Ser Glu Lys Asn Phe Leu Gly Gln Ile Gly Glu His Leu Ala 165 170 175 Val Ala Ile Val Gln Ser Leu Leu Tyr Ser Glu Val 180 185 23 187 PRT Unknown Description of Unknown Organismsl110821a 23 Asp Phe Leu Arg Asn Val Ile Asn Lys Phe His Arg Ala Leu Thr Leu 1 5 10 15 Arg Glu Thr Leu Gln Val Ile Val Glu Glu Ala Arg Ile Phe Leu Gly 20 25 30 Val Asp Arg Val Lys Ile Tyr Lys Phe Ala Ser Asp Gly Ser Gly Glu 35 40 45 Val Leu Ala Glu Ala Val Asn Arg Ala Ala Leu Pro Ser Leu Leu Gly 50 55 60 Leu His Phe Pro Val Glu Asp Ile Pro Pro Gln Ala Arg Glu Glu Leu 65 70 75 80 Gly Asn Gln Arg Lys Met Ile Ala Val Asp Val Ala His Arg Arg Lys 85 90 95 Lys Ser His Glu Leu Ser Gly Arg Ile Ser Pro Thr Glu His Ser Asn 100 105 110 Gly His Tyr Thr Thr Val Asp Ser Cys His Ile Gln Tyr Leu Leu Ala 115 120 125 Met Gly Val Leu Ser Ser Leu Thr Val Pro Val Met Gln Asp Gln Gln 130 135 140 Leu Trp Gly Ile Met Ala Val His His Ser Lys Pro Arg Arg Phe Thr 145 150 155 160 Glu Gln Glu Trp Glu Thr Met Ala Leu Leu Ser Lys Glu Val Ser Leu 165 170 175 Ala Ile Thr Gln Ser Gln Leu Ser Arg Gln Val 180 185 24 210 PRT Artificial Sequence Description of Artificial SequenceCph2-N197 24 Met Asn Pro Asn Arg Ser Leu Glu Asp Phe Leu Arg Asn Val Ile Asn 1 5 10 15 Lys Phe His Arg Ala Leu Thr Leu Arg Glu Thr Leu Gln Val Ile Val 20 25 30 Glu Glu Ala Arg Ile Phe Leu Gly Val Asp Arg Val Lys Ile Tyr Lys 35 40 45 Phe Ala Ser Asp Gly Ser Gly Glu Val Leu Ala Glu Ala Val Asn Arg 50 55 60 Ala Ala Leu Pro Ser Leu Gly Leu His Phe Pro Val Glu Asp Ile Pro 65 70 75 80 Pro Gln Ala Arg Glu Glu Leu Gly Asn Gln Arg Lys Met Ile Ala Val 85 90 95 Asp Val Ala His Arg Arg Lys Lys Ser His Glu Leu Ser Gly Arg Ile 100 105 110 Ser Pro Thr Glu His Ser Asn Gly His Tyr Thr Thr Val Asp Ser Cys 115 120 125 His Ile Gln Tyr Leu Leu Ala Met Gly Val Leu Ser Leu Thr Val Pro 130 135 140 Val Met Gln Asp Gln Gln Leu Trp Gly Ile Met Ala Val His His Ser 145 150 155 160 Lys Pro Arg Arg Phe Thr Glu Gln Glu Trp Glu Thr Met Ala Leu Leu 165 170 175 Ser Lys Glu Val Ser Leu Ala Ile Thr Gln Ser Gln Leu Ser Arg Gln 180 185 190 Val His Gln Gly Arg Pro Ala Gly Ser Ala Trp Arg His Pro Gln Phe 195 200 205 Gly Gly 210 

What is claimed is:
 1. A composition comprising an apoprotein polypeptide of between about 190 amino acids and about 400 amino acids, which apoprotein polypeptide comprises a lyase domain.
 2. The composition of claim 1, wherein the apoprotein polypeptide is selected from the group consisting of a plant apoprotein, an algal apoprotein, and a cyanobacterial apoprotein.
 3. The composition of claim 1, wherein the apoprotein polypeptide consists of about 390 amino acids.
 4. The composition of claim 1, wherein the apoprotein polypeptide consists of about 200 amino acids.
 5. The composition of claim 4, wherein the aopoprotein protein is as shown in SEQ ID NO:
 9. 6. The composition of claim 1, wherein the apoprotein protein consists of a lyase domain.
 7. The composition of claim 1, wherein the apoprotein polypeptide is from Synechocystis sp.
 8. The composition of claim 7, wherein the apoprotein polypeptide is Cph2.
 9. The composition of claim 1, wherein the apoprotein is covalently linked to a bilin to form a fluorescent adduct.
 10. The composition of claim 9, wherein the bilin is a tetrapyrrole.
 11. The composition of claim 10, wherein the bilin is phycoerythrobilin.
 12. The composition of claim 9, wherein the fluorescent adduct is linked to a biomolecule.
 13. The composition of claim 12, wherein the biomolecule is selected from the group consisting of a protein, a carbohydrate, a lipid, and a nucleic acid.
 14. The composition of claim 13, wherein the biomolecule is a nucleic acid.
 15. The composition of claim 13, wherein the biomolecule is a protein.
 16. The composition of claim 15, wherein the protein is an antibody.
 17. A method of detecting the presence of a biomolecule in a sample, the method comprising: providing a sample comprising a biomolecule linked to a fluorescent adduct consisting of a bilin and an apoprotein of between about 190 amino acids and about 400 amino acids, which apoprotein polypeptide comprises a lyase domain; contacting the sample with light which causes the fluorescent adduct to emit light; detecting the emitted light, thereby detecting the presence of the biomolecule.
 18. The method of claim 17, wherein the step of contacting the sample with light includes contacting the sample with light having a wavelength of about 570 nm.
 19. The method of claim 17, wherein the step of detecting the emitted light includes detecting light having a wavelength of about 590 nm.
 20. The method of claim 17, wherein the apoprotein polypeptide is selected from the group consisting of a plant apoprotein, an algal apoprotein, and a cyanobacterial apoprotein.
 21. The method of claim 17, wherein the apoprotein polypeptide consists of a lyase domain.
 22. The method of claim 17, wherein the apoprotein polypeptide consists of about 390 amino acids.
 23. The method of claim 17, wherein the apoprotein polypeptide consists of about 200 amino acids.
 24. The method of claim 23, wherein the aopoprotein is a shown in SEQ ID NO:
 9. 25. The method of claim 17, wherein the apoprotein polypeptide is from Synechocystis sp.
 26. The method of claim 25, wherein the apoprotein polypeptide is Cph2.
 27. The method of claim 17, wherein the bilin is a tetrapyrrole.
 28. The method of claim 27, wherein the bilin is phycoerythrobilin.
 29. The method of claim 17, wherein the biomolecule is selected from the group consisting of a protein, a carbohydrate, a lipid, and a nucleic acid.
 30. The method of claim 29, wherein the biomolecule is a nucleic acid.
 31. The method of claim 29, wherein the biomolecule is a protein.
 32. The method of claim 31, wherein the protein is an antibody. 